Data URI Compression - Real-World Test Results

Overview

Test Data: 4,916 production agent metadata from 8004scan database

Test Summary

Dataset Distribution

Size Range	Count	Percentage	Average Size
< 1KB	2,872	58.4%	0.51 KB
1-2KB	2,015	41.0%	1.19 KB
2-3KB	13	0.3%	2.26 KB
3-5KB	15	0.3%	3.76 KB
> 5KB	1	0.02%	5.76 KB

Key Insight: 99.4% of real agents use metadata < 2KB.

Small Metadata (< 2KB) - 99.4% of Agents

Test: 30 samples, average 0.73 KB (751 bytes), ~33,000 gas uncompressed

Algorithm	Level	Compression Ratio	Gas Saved	Compression Speed	Decompression Speed
Brotli	11	47.7%	~6,180 gas (17.4%)	2.62ms	0.02ms
Brotli	9	41.2%	~5,327 gas (15.0%)	1.43ms	0.02ms
Zstd	22	34.9%	~4,570 gas (12.8%)	0.15ms	0.01ms
Zstd	15	34.8%	~4,561 gas (12.8%)	0.11ms	0.01ms
Zstd	9	34.4%	~4,533 gas (12.7%)	0.06ms	0.01ms
Gzip	9	34.7%	~4,613 gas (12.9%)	0.05ms	0.02ms
Gzip	6	34.7%	~4,613 gas (12.9%)	0.05ms	0.03ms
LZ4	12	14.5%	~2,110 gas (5.8%)	0.04ms	0.00ms
LZ4	9	14.5%	~2,110 gas (5.8%)	0.03ms	0.00ms

Recommendations for Small Metadata

Priority	Algorithm	Reason
1st Choice	Zstd-15	Best balance: 34.8% ratio, 0.11ms speed, excellent cross-platform support
2nd Choice	Brotli-11	Highest ratio (47.7%) but slower (2.62ms), good for static content
Speed Priority	LZ4-9	Fastest (0.03ms) but lowest ratio (14.5%), only if speed critical

Medium Metadata (2-5KB) - 0.6% of Agents

Test: 2 samples, average 3.11 KB (3,182 bytes), ~71,920 gas uncompressed

Algorithm	Level	Compression Ratio	Gas Saved	Compression Speed	Decompression Speed
Brotli	11	66.8%	~35,288 gas (46.9%)	6.91ms	0.04ms
Brotli	9	62.3%	~33,144 gas (43.8%)	3.75ms	0.03ms
Zstd	22	59.3%	~31,672 gas (41.8%)	1.23ms	0.02ms
Zstd	15	59.2%	~31,576 gas (41.6%)	0.68ms	0.02ms
Zstd	9	58.9%	~31,472 gas (41.5%)	0.20ms	0.03ms
Gzip	9	59.5%	~31,704 gas (41.9%)	0.10ms	0.04ms
Gzip	6	59.5%	~31,704 gas (41.9%)	0.12ms	0.07ms
LZ4	12	45.2%	~24,920 gas (32.2%)	0.13ms	0.01ms
LZ4	9	45.2%	~24,912 gas (32.1%)	0.07ms	0.01ms

Recommendations for Medium Metadata

Priority	Algorithm	Reason
1st Choice	Zstd-15	Excellent ratio (59.2%), fast (0.68ms), production-ready
2nd Choice	Brotli-11	Best ratio (66.8%) but slower (6.91ms), worth it for rare large metadata
Speed Priority	Gzip-9	Very fast (0.10ms), good ratio (59.5%), best compatibility

Key Findings

1. Real Compression Ratios Lower Than Expected

Previous Estimates: 60-70% compression ratio
Actual Results:

Small metadata (<2KB): 35-48% compression ratio
Medium metadata (2-5KB): 59-67% compression ratio

Reason: Real agent metadata is already fairly compact with minimal repetition.

2. Gas Savings Still Worthwhile

Despite lower compression ratios, gas savings remain valuable:

Small metadata (99% of agents): Save 4,000-6,000 gas per registration
Medium metadata (1% of agents): Save 31,000-35,000 gas per registration

For platforms with 1,000+ agents, cumulative savings are significant.

3. Zstd-15 is the Clear Winner

Why Zstd-15:

✅ Excellent compression ratio (35-59%)
✅ Fast speed (0.11-0.68ms)
✅ Cross-platform support (Python, Node.js, Rust, Go)
✅ Production-proven in many systems (Facebook, Linux kernel)

Brotli-11 Alternative:

Better compression (48-67%) but 6-24x slower
Good for static content, pre-computed compression
Worse cross-platform support (native in browsers, libraries elsewhere)

4. LZ4 Not Recommended

LZ4 Results:

Lowest compression ratio (14-45%)
Marginal speed advantage (0.03ms vs 0.11ms for Zstd-15)
Speed difference negligible for typical use cases

Conclusion: Zstd-15's superior compression ratio outweighs LZ4's minimal speed advantage.

Production Recommendations

Default Algorithm

Recommended: Zstd level 15

python

# Backend (Python)
import zstandard as zstd
compressor = zstd.ZstdCompressor(level=15)
compressed = compressor.compress(json_bytes)

typescript

// Frontend (TypeScript)
// Note: Use gzip instead of zstd for browser compatibility
import { compress } from "fflate";
const compressed = compress(json_bytes, { level: 9 });

When to Use Compression

Metadata Size	Recommendation	Gas Saved
< 500 bytes	❌ Don't compress	Minimal savings, overhead not worth it
500-2000 bytes	⚖️ Optional	~2,000-6,000 gas
2-5KB	✅ Recommended	~31,000-35,000 gas
> 5KB	✅✅ Strongly recommended	35,000+ gas

Implementation Checklist

[x] Parser supports enc=zstd parameter in Data URI
[x] Zip bomb protection (100KB decompression limit)
[x] Algorithm whitelist (zstd, gzip, br, lz4 only)
[x] Async decompression in Celery workers
[ ] Frontend compression UI with gas savings preview
[ ] Analytics dashboard tracking compression adoption

Test Methodology

Data Source

Database: 8004scan production PostgreSQL
Table: agents.metadata_json
Total Records: 4,916 agents
Chains: Ethereum Sepolia + Base Sepolia

Test Process

Fetch real metadata from database
Test each algorithm at multiple compression levels
Measure compression ratio, gas savings, speed
Verify decompression correctness
Aggregate statistics

Gas Calculation Formula

text

Gas = (data_size_bytes × 16) + 21,000

16 gas/byte: EVM calldata cost
21,000 gas: Base transaction cost

Conclusion

TLDR:

Real compression ratios (35-67%) are lower than theoretical estimates (60-70%)
Gas savings (4,000-35,000 per agent) are still worthwhile for production
Recommended: Zstd-15 for best balance of compression and speed
Alternative: Brotli-11 for maximum compression (rare large metadata)
99% of agents use small metadata (<2KB), save ~4,500 gas each

Next Steps:

Update documentation with real test data ✅
Set default compression to Zstd-15 in backend
Add frontend compression UI with gas preview
Track compression adoption metrics

Data URI Compression - Real-World Test Results

Overview ​

Test Summary ​

Dataset Distribution ​

Small Metadata (< 2KB) - 99.4% of Agents ​

Recommendations for Small Metadata ​

Medium Metadata (2-5KB) - 0.6% of Agents ​

Recommendations for Medium Metadata ​

Key Findings ​

1. Real Compression Ratios Lower Than Expected ​

2. Gas Savings Still Worthwhile ​

3. Zstd-15 is the Clear Winner ​

4. LZ4 Not Recommended ​

Production Recommendations ​

Default Algorithm ​

When to Use Compression ​

Implementation Checklist ​

Test Methodology ​

Data Source ​

Test Process ​

Gas Calculation Formula ​

Conclusion ​