Skip to content

Data URI Compression - Real-World Test Results

Overview

Test Data: 4,916 production agent metadata from 8004scan database

Test Summary

Dataset Distribution

Size RangeCountPercentageAverage Size
< 1KB2,87258.4%0.51 KB
1-2KB2,01541.0%1.19 KB
2-3KB130.3%2.26 KB
3-5KB150.3%3.76 KB
> 5KB10.02%5.76 KB

Key Insight: 99.4% of real agents use metadata < 2KB.

Small Metadata (< 2KB) - 99.4% of Agents

Test: 30 samples, average 0.73 KB (751 bytes), ~33,000 gas uncompressed

AlgorithmLevelCompression RatioGas SavedCompression SpeedDecompression Speed
Brotli1147.7%~6,180 gas (17.4%)2.62ms0.02ms
Brotli941.2%~5,327 gas (15.0%)1.43ms0.02ms
Zstd2234.9%~4,570 gas (12.8%)0.15ms0.01ms
Zstd1534.8%~4,561 gas (12.8%)0.11ms0.01ms
Zstd934.4%~4,533 gas (12.7%)0.06ms0.01ms
Gzip934.7%~4,613 gas (12.9%)0.05ms0.02ms
Gzip634.7%~4,613 gas (12.9%)0.05ms0.03ms
LZ41214.5%~2,110 gas (5.8%)0.04ms0.00ms
LZ4914.5%~2,110 gas (5.8%)0.03ms0.00ms

Recommendations for Small Metadata

PriorityAlgorithmReason
1st ChoiceZstd-15Best balance: 34.8% ratio, 0.11ms speed, excellent cross-platform support
2nd ChoiceBrotli-11Highest ratio (47.7%) but slower (2.62ms), good for static content
Speed PriorityLZ4-9Fastest (0.03ms) but lowest ratio (14.5%), only if speed critical

Medium Metadata (2-5KB) - 0.6% of Agents

Test: 2 samples, average 3.11 KB (3,182 bytes), ~71,920 gas uncompressed

AlgorithmLevelCompression RatioGas SavedCompression SpeedDecompression Speed
Brotli1166.8%~35,288 gas (46.9%)6.91ms0.04ms
Brotli962.3%~33,144 gas (43.8%)3.75ms0.03ms
Zstd2259.3%~31,672 gas (41.8%)1.23ms0.02ms
Zstd1559.2%~31,576 gas (41.6%)0.68ms0.02ms
Zstd958.9%~31,472 gas (41.5%)0.20ms0.03ms
Gzip959.5%~31,704 gas (41.9%)0.10ms0.04ms
Gzip659.5%~31,704 gas (41.9%)0.12ms0.07ms
LZ41245.2%~24,920 gas (32.2%)0.13ms0.01ms
LZ4945.2%~24,912 gas (32.1%)0.07ms0.01ms

Recommendations for Medium Metadata

PriorityAlgorithmReason
1st ChoiceZstd-15Excellent ratio (59.2%), fast (0.68ms), production-ready
2nd ChoiceBrotli-11Best ratio (66.8%) but slower (6.91ms), worth it for rare large metadata
Speed PriorityGzip-9Very fast (0.10ms), good ratio (59.5%), best compatibility

Key Findings

1. Real Compression Ratios Lower Than Expected

Previous Estimates: 60-70% compression ratio
Actual Results:

  • Small metadata (<2KB): 35-48% compression ratio
  • Medium metadata (2-5KB): 59-67% compression ratio

Reason: Real agent metadata is already fairly compact with minimal repetition.

2. Gas Savings Still Worthwhile

Despite lower compression ratios, gas savings remain valuable:

  • Small metadata (99% of agents): Save 4,000-6,000 gas per registration
  • Medium metadata (1% of agents): Save 31,000-35,000 gas per registration

For platforms with 1,000+ agents, cumulative savings are significant.

3. Zstd-15 is the Clear Winner

Why Zstd-15:

  • ✅ Excellent compression ratio (35-59%)
  • ✅ Fast speed (0.11-0.68ms)
  • ✅ Cross-platform support (Python, Node.js, Rust, Go)
  • ✅ Production-proven in many systems (Facebook, Linux kernel)

Brotli-11 Alternative:

  • Better compression (48-67%) but 6-24x slower
  • Good for static content, pre-computed compression
  • Worse cross-platform support (native in browsers, libraries elsewhere)

LZ4 Results:

  • Lowest compression ratio (14-45%)
  • Marginal speed advantage (0.03ms vs 0.11ms for Zstd-15)
  • Speed difference negligible for typical use cases

Conclusion: Zstd-15's superior compression ratio outweighs LZ4's minimal speed advantage.

Production Recommendations

Default Algorithm

Recommended: Zstd level 15

python
# Backend (Python)
import zstandard as zstd
compressor = zstd.ZstdCompressor(level=15)
compressed = compressor.compress(json_bytes)
typescript
// Frontend (TypeScript)
// Note: Use gzip instead of zstd for browser compatibility
import { compress } from "fflate";
const compressed = compress(json_bytes, { level: 9 });

When to Use Compression

Metadata SizeRecommendationGas Saved
< 500 bytes❌ Don't compressMinimal savings, overhead not worth it
500-2000 bytes⚖️ Optional~2,000-6,000 gas
2-5KB✅ Recommended~31,000-35,000 gas
> 5KB✅✅ Strongly recommended35,000+ gas

Implementation Checklist

  • [x] Parser supports enc=zstd parameter in Data URI
  • [x] Zip bomb protection (100KB decompression limit)
  • [x] Algorithm whitelist (zstd, gzip, br, lz4 only)
  • [x] Async decompression in Celery workers
  • [ ] Frontend compression UI with gas savings preview
  • [ ] Analytics dashboard tracking compression adoption

Test Methodology

Data Source

  • Database: 8004scan production PostgreSQL
  • Table: agents.metadata_json
  • Total Records: 4,916 agents
  • Chains: Ethereum Sepolia + Base Sepolia

Test Process

  1. Fetch real metadata from database
  2. Test each algorithm at multiple compression levels
  3. Measure compression ratio, gas savings, speed
  4. Verify decompression correctness
  5. Aggregate statistics

Gas Calculation Formula

text
Gas = (data_size_bytes × 16) + 21,000
  • 16 gas/byte: EVM calldata cost
  • 21,000 gas: Base transaction cost

Conclusion

TLDR:

  • Real compression ratios (35-67%) are lower than theoretical estimates (60-70%)
  • Gas savings (4,000-35,000 per agent) are still worthwhile for production
  • Recommended: Zstd-15 for best balance of compression and speed
  • Alternative: Brotli-11 for maximum compression (rare large metadata)
  • 99% of agents use small metadata (<2KB), save ~4,500 gas each

Next Steps:

  1. Update documentation with real test data ✅
  2. Set default compression to Zstd-15 in backend
  3. Add frontend compression UI with gas preview
  4. Track compression adoption metrics