Description
Practical guide for parsing ERC-8004 agent metadata
Audience: Backend developers, indexer authors, explorer builders
Overview
This guide provides practical instructions for implementing a robust ERC-8004 agent metadata parser that handles real-world data patterns.
What You'll Learn:
- ✅ Parse 7 URI formats (IPFS, HTTP, Data URI variants)
- ✅ Handle malformed metadata gracefully
- ✅ Extract structured data (endpoints, OASF skills, wallets)
- ✅ Validate against ERC-8004 standard
- ✅ Generate helpful warnings for developers
Parser Architecture
High-Level Flow
Input: agentURI string
↓
1. URI Format Detection
├─ Data URI (base64) → decode base64 → parse JSON
├─ Data URI (plain) → URL decode → parse JSON
├─ IPFS URI → fetch from gateway → parse JSON
├─ HTTP(S) URL → fetch → parse JSON
└─ Plain JSON → parse directly
↓
2. JSON Parsing & Validation
├─ Check required fields (type, name, description, image)
├─ Validate endpoints array structure
└─ Collect warnings for non-critical issues
↓
3. Structured Data Extraction
├─ Extract endpoint URLs by type (MCP, A2A, OASF, wallet)
├─ Parse OASF skills and domains
└─ Extract metadata for storage
↓
Output: { metadata, warnings[], status }2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
5-Level Validation System
Level 1: Syntax → Can we parse the URI and JSON?
Level 2: Schema → Does it have required fields?
Level 3: Endpoint → Are endpoints valid and reachable?
Level 4: Semantic → Do values make sense (CAIP formats, versions)?
Level 5: Status → Is the agent active and operational?2
3
4
5
URI Format Parsers
1. Data URI (Base64)
Format: data:application/json;base64,<BASE64>
Logic:
IF uri starts with "data:application/json;base64," THEN
encoded ← extract data (remove "data:application/json;base64," prefix)
// Edge case: ChaosChain pattern (claimed base64 but plain JSON)
IF encoded starts with "{" THEN
ADD_WARNING("base64_uri_with_plain_json")
RETURN parse JSON from encoded
END IF
decoded ← base64 decode(encoded)
RETURN parse JSON from decoded
END IF2
3
4
5
6
7
8
9
10
11
12
Production Stats (8004scan, Dec 2025):
- 18% of agents use base64 data URIs
- ~5% have the ChaosChain pattern (claimed base64 but plain JSON)
2. Data URI (Plain)
Format: data:application/json,<JSON>
Logic:
IF uri starts with "data:application/json," THEN
data ← extract data (remove "data:application/json," prefix)
// Some URIs are URL-encoded
IF data contains "%" THEN
data ← URL decode(data)
ADD_WARNING("url_encoded_json")
END IF
RETURN parse JSON from data
END IF2
3
4
5
6
7
8
9
10
11
3. IPFS URIs
Format: ipfs://<CID> or ipfs://<CID>/path
Logic:
CONSTANTS:
GATEWAYS ← [
"https://ipfs.io/ipfs/{}",
"https://cloudflare-ipfs.com/ipfs/{}",
"https://gateway.pinata.cloud/ipfs/{}"
]
ALGORITHM: FetchIPFS(uri)
INPUT: uri - IPFS URI string
OUTPUT: JSON object or NULL
BEGIN
cid ← extract CID from uri (remove "ipfs://" prefix)
FOR EACH gateway IN GATEWAYS DO
url ← format gateway with cid
TRY
response ← HTTP GET(url, timeout: 5 seconds)
IF response is successful THEN
RETURN parse JSON from response
END IF
CATCH network error
CONTINUE to next gateway
END TRY
END FOR
ADD_ERROR("ipfs_fetch_failed")
RETURN NULL
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Production Stats:
- 51% of agents use IPFS URIs
- Gateway fallback is essential (success rate: ~95%)
4. HTTP/HTTPS URLs
Logic:
ALGORITHM: FetchHTTP(uri)
INPUT: uri - HTTP(S) URL string
OUTPUT: JSON object or NULL
BEGIN
IF uri starts with "http://" THEN
ADD_WARNING("http_not_https") // Security warning
END IF
TRY
response ← HTTP GET(uri, timeout: 5 seconds)
IF response is successful THEN
RETURN parse JSON from response
ELSE
ADD_ERROR("http_status_" + response.status_code)
RETURN NULL
END IF
CATCH timeout error
ADD_ERROR("http_timeout")
RETURN NULL
CATCH network error
ADD_ERROR("http_error")
RETURN NULL
END TRY
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Production Stats:
- 22% of agents use HTTP(S) URLs
- Timeout issues: ~8% of HTTP URIs
5. Plain JSON Fallback
Logic:
IF uri starts with "{" THEN
ADD_WARNING("plain_json_without_uri_scheme")
RETURN parse JSON from uri
END IF2
3
4
Schema Validation
Required Fields
CONSTANTS:
REQUIRED ← {"type", "name", "description", "image"}
ALGORITHM: ValidateRequired(metadata)
INPUT: metadata - JSON object
OUTPUT: warnings list
BEGIN
warnings ← empty list
missing ← REQUIRED fields not present in metadata
FOR EACH field IN missing DO
ADD warnings("missing_required_" + field)
END FOR
IF metadata.type ≠ "https://eips.ethereum.org/EIPS/eip-8004#registration-v1" THEN
ADD warnings("invalid_type_field")
END IF
RETURN warnings
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Endpoints Validation
ALGORITHM: ValidateEndpoints(metadata)
INPUT: metadata - JSON object
OUTPUT: warnings list
BEGIN
warnings ← empty list
// Support both "services" (new) and "endpoints" (legacy)
services ← get "services" from metadata OR get "endpoints" from metadata
// Emit info if using legacy field name
IF metadata contains "endpoints" AND metadata does not contain "services" THEN
ADD warnings("WA031: using legacy 'endpoints' field")
END IF
// Common typo: "endpoint" (singular)
IF metadata contains "endpoint" AND services is null THEN
ADD warnings("typo_endpoint_singular")
RETURN warnings
END IF
IF services is null THEN
ADD warnings("missing_services_or_endpoints")
ELSE IF services is not an array THEN
ADD warnings("services_not_array")
ELSE IF services is empty THEN
ADD warnings("empty_services") // Agent not reachable
ELSE
FOR EACH service IN services DO
warnings ← warnings + ValidateService(service)
END FOR
END IF
RETURN warnings
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Endpoint-Specific Validation
ALGORITHM: ValidateEndpoint(endpoint)
INPUT: endpoint - JSON object
OUTPUT: warnings list
BEGIN
warnings ← empty list
name ← endpoint.name
CASE name OF
"MCP":
IF endpoint.endpoint is empty THEN
ADD warnings("mcp_missing_endpoint")
END IF
IF endpoint.version ≠ "2025-06-18" THEN
ADD warnings("mcp_nonstandard_version")
END IF
"A2A":
IF endpoint.endpoint is empty THEN
ADD warnings("a2a_missing_endpoint")
END IF
IF endpoint.version NOT IN ["0.3.0", "0.30"] THEN
ADD info("a2a_nonstandard_version") // SHOULD, not MUST per spec
END IF
IF endpoint.endpoint does not contain ".well-known/agent-card.json" THEN
ADD info("a2a_missing_well_known") // Recommended path, not required
END IF
"OASF":
IF endpoint.skills is empty AND endpoint.domains is empty THEN
ADD warnings("oasf_empty")
END IF
warnings ← warnings + ValidateOASFSkills(endpoint.skills)
warnings ← warnings + ValidateOASFDomains(endpoint.domains)
"agentWallet":
IF NOT IsValidCAIP10(endpoint.endpoint) THEN
ADD warnings("wallet_invalid_format")
END IF
END CASE
RETURN warnings
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
CAIP Format Validation
ALGORITHM: IsValidCAIP2(value)
INPUT: value - string to validate
OUTPUT: boolean (true if valid, false otherwise)
BEGIN
// Format: namespace:chainId:0xAddress
pattern ← "^[a-z][-a-z0-9]{0,31}:[0-9]+:0x[a-fA-F0-9]{40}$"
RETURN value matches pattern
END
ALGORITHM: IsValidCAIP10(value)
INPUT: value - string to validate
OUTPUT: boolean
BEGIN
// Format: namespace:chainId:accountAddress (same as CAIP-2)
RETURN IsValidCAIP2(value)
END
Examples:
✓ "eip155:1:0x8004a6090Cd10A7288092483047B097295Fb8847"
✗ "eip155:0x8004..." (missing chainId)
✗ "0x8004..." (missing namespace)2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Data Extraction
Structured Data Extraction
ALGORITHM: ExtractAgentData(metadata)
INPUT: metadata - validated JSON object
OUTPUT: structured data record
BEGIN
data ← new record
// Basic fields
data.name ← metadata.name
data.description ← metadata.description
data.image_url ← metadata.image
data.active ← metadata.active OR false (if not present)
data.x402_support ← metadata.x402Support OR false (if not present)
data.updated_at ← metadata.updatedAt
// Endpoints (find by name in endpoints array)
data.mcp_server ← FindEndpoint(metadata.endpoints, "MCP").endpoint
data.a2a_endpoint ← FindEndpoint(metadata.endpoints, "A2A").endpoint
data.agent_wallet ← FindEndpoint(metadata.endpoints, "agentWallet").endpoint
data.ens ← FindEndpoint(metadata.endpoints, "ENS").endpoint
data.did ← FindEndpoint(metadata.endpoints, "DID").endpoint
// OASF data
oasfEndpoint ← FindEndpoint(metadata.endpoints, "OASF")
data.oasf_skills ← oasfEndpoint.skills OR empty list
data.oasf_domains ← oasfEndpoint.domains OR empty list
// Full metadata for backup
data.metadata_json ← metadata
RETURN data
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Complete Parser Implementation
Core Parser Algorithm
ALGORITHM: ParseAgentMetadata(uri)
INPUT: uri - string containing agent metadata URI
OUTPUT: (metadata, warnings[], status) or NULL
BEGIN
errors ← empty list
warnings ← empty list
// Step 1: Parse URI to JSON
metadata ← ParseURI(uri)
IF metadata = NULL THEN
RETURN NULL
END IF
// Step 2: Validate schema
warnings ← warnings + ValidateRequired(metadata)
warnings ← warnings + ValidateEndpoints(metadata)
// Step 3: Determine status
IF errors is not empty THEN
status ← "error"
ELSE IF warnings is not empty THEN
status ← "warning"
ELSE
status ← "success"
END IF
RETURN (metadata, warnings, status)
END
ALGORITHM: ParseURI(uri)
INPUT: uri - string
OUTPUT: JSON object or NULL
BEGIN
// Try formats in order of frequency
// Data URIs (36% of agents) - fastest to parse
IF uri starts with "data:" THEN
metadata ← TryDataURI(uri)
IF metadata ≠ NULL THEN
RETURN metadata
END IF
END IF
// IPFS (51% of agents) - requires network fetch
IF uri starts with "ipfs://" THEN
metadata ← FetchIPFS(uri)
IF metadata ≠ NULL THEN
RETURN metadata
END IF
END IF
// HTTP(S) (22% of agents) - requires network fetch
IF uri starts with "http" THEN
metadata ← FetchHTTP(uri)
IF metadata ≠ NULL THEN
RETURN metadata
END IF
END IF
// Plain JSON fallback (edge case)
IF uri starts with "{" THEN
ADD_WARNING("plain_json_without_uri_scheme")
RETURN ParseJSON(uri)
END IF
ADD_ERROR("unsupported_uri_format")
RETURN NULL
END
ALGORITHM: ValidateRequired(metadata)
INPUT: metadata - JSON object
OUTPUT: warnings list
BEGIN
warnings ← empty list
required ← {"type", "name", "description", "image"}
FOR EACH field IN required DO
IF field not in metadata THEN
ADD warnings("missing_required_" + field)
END IF
END FOR
IF metadata.type ≠ "https://eips.ethereum.org/EIPS/eip-8004#registration-v1" THEN
ADD warnings("invalid_type_field")
END IF
RETURN warnings
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
Usage Pattern
MAIN Program:
// Parse agent metadata
uri ← GetAgentURI(agentId)
result ← ParseAgentMetadata(uri)
IF result ≠ NULL THEN
(metadata, warnings, status) ← result
// Extract structured data
agentData ← ExtractAgentData(metadata)
agentData.parse_status ← status
agentData.parse_warnings ← warnings
// Store in database
StoreAgent(agentData)
OUTPUT "Success: " + status
ELSE
OUTPUT "Parse failed"
END IF
END2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Performance Optimization
Caching Strategy
1. Data URI Caching (in-memory LRU)
- Data URIs are immutable → safe to cache forever
- Use LRU cache with 1000 entry limit
- Cache hit rate: ~40% in production
2. IPFS Caching (Redis, 1 hour TTL)
- Cache key: sha256(ipfs_uri)
- Reduces gateway load by 85%
- TTL: 3600s (content rarely changes)
3. HTTP Caching (Redis, 5 minute TTL)
- Cache key: sha256(http_url)
- Respects Cache-Control headers
- TTL: 300s (content may update)2
3
4
5
6
7
8
9
10
11
12
13
14
Batch Processing
ALGORITHM: ParseAgentBatch(uris)
INPUT: uris - list of URI strings
OUTPUT: list of parse results
BEGIN
results ← empty list
// Process URIs in parallel
FOR EACH uri IN uris (in parallel) DO
result ← ParseAgentMetadata(uri)
ADD result to results
END FOR
RETURN results
END
Note: Typical throughput: 100-200 agents/second (with caching)2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Common Issues & Solutions
Issue 1: Empty Services Array
Pattern: {"services": []} (or legacy {"endpoints": []})
Impact: Agent not reachable, but metadata is valid
Solution: Warn but don't fail validation
IF services is empty THEN
ADD_WARNING("empty_services")
// Still store agent, may be pre-registration
END IF2
3
4
Production Stats: ~15% of agents have empty endpoints
Issue 2: Null agentId in Registrations
Pattern: {"registrations": [{"agentId": null}]}
Context: Common during registration flow (agentId assigned after transaction)
Solution: Accept and log as info (not warning)
IF registration.agentId is null THEN
ADD_INFO("registration_null_agent_id")
// Expected for first-time deployments; will be updated after transaction confirms
END IF2
3
4
Rationale: This is expected behavior, not a problem. First-time deployments don't know their tokenId until the on-chain transaction confirms.
Production Stats: ~30% of initial registrations have null agentId
Issue 3: Version Number Variations
Pattern: A2A version "0.30" instead of "0.3.0"
Solution: Accept both formats
CONSTANTS:
VALID_A2A_VERSIONS ← ["0.3.0", "0.30"]
IF endpoint.version IN VALID_A2A_VERSIONS THEN
IF endpoint.version = "0.30" THEN
ADD_WARNING("a2a_version_typo") // Inform but accept
END IF
END IF2
3
4
5
6
7
8
Validation Error Codes
8004scan-specific: The following error codes are used by 8004scan's validation system. Other implementations MAY use different codes.
Severity Levels
8004scan uses three severity levels following RFC 2119/8174:
| Level | Description | Action |
|---|---|---|
| Error | Critical issues preventing proper functioning | Parse fails |
| Warning | Functional issues but not critical | Parse succeeds |
| Info | Recommendations, expected states, best practices | Advisory only |
Level 1: Syntax Validation
Errors (Parse Fails):
| Issue | Error Code |
|---|---|
| Empty URI | empty_uri |
| Invalid JSON syntax | invalid_json |
| Invalid base64 | invalid_base64 |
| Non-dict root | invalid_root_type |
Warnings (Parse Succeeds):
| Issue | Warning Code |
|---|---|
| Plain JSON without scheme | plain_json_without_uri_scheme |
| base64 URI with plain JSON | base64_uri_with_plain_json |
| URL-encoded JSON | url_encoded_json |
Level 2: Schema Validation
Critical Warnings (Likely to cause display issues):
| Issue | Warning Code | Severity |
|---|---|---|
Missing type | missing_required_type | High |
Invalid type value | invalid_type_field | High |
Missing name | missing_required_name | High |
Missing description | missing_required_description | High |
Missing image | missing_recommended_image | Medium |
Empty endpoints | empty_endpoints | Medium |
Standard Warnings:
| Issue | Warning Code |
|---|---|
endpoint instead of endpoints | typo_endpoint_singular |
registration instead of registrations | typo_registration_singular |
registrations[].agentId mismatch | registration_agent_id_mismatch |
| Invalid CAIP-2 format | invalid_caip2_format |
| Invalid CAIP-10 format | invalid_caip10_format |
Info Messages (Expected states, recommendations):
| Issue | Info Code |
|---|---|
registrations[].agentId is null | registration_null_agent_id |
Missing registrations array | missing_registrations |
Empty registrations array | empty_registrations |
Level 3: Endpoint Validation
MCP Endpoint:
| Issue | Code | Severity |
|---|---|---|
Missing endpoint | mcp_missing_endpoint | Warning |
Missing version | mcp_missing_version | Info |
| Non-standard version | mcp_nonstandard_version | Info |
Note: Per ERC-8004 spec: "The version field in endpoints is a SHOULD, not a MUST."
A2A Endpoint:
| Issue | Code | Severity |
|---|---|---|
Missing endpoint | a2a_missing_endpoint | Warning |
Missing version | a2a_missing_version | Info |
| Non-standard version | a2a_nonstandard_version | Info |
Missing .well-known path | a2a_missing_well_known | Info |
Note: A2A
.well-known/agent-card.jsonpath is shown as an example in ERC-8004, not a requirement. Other paths are valid.
OASF Endpoint:
| Issue | Warning Code |
|---|---|
Missing both skills and domains | oasf_empty |
| Invalid skill slug | oasf_invalid_skill |
| Invalid domain slug | oasf_invalid_domain |
| Non-standard version | oasf_nonstandard_version |
agentWallet Endpoint:
| Issue | Warning Code |
|---|---|
| Invalid CAIP-10 format | wallet_invalid_format |
| Invalid checksum | wallet_invalid_checksum |
Level 4: Semantic Validation
Trust Models:
| Issue | Warning Code |
|---|---|
| Unknown trust model | unknown_trust_model |
Empty supportedTrust | empty_supported_trust |
Known trust models: "reputation", "crypto-economic", "tee-attestation", "social-graph"
agentHash Verification (8004scan Extension):
| Issue | Warning Code | Severity |
|---|---|---|
| Hash mismatch (onchain vs computed) | agent_hash_mismatch | Warning |
| HTTP/HTTPS URI without agentHash | agent_uri_not_hashed | Info |
Note:
agentHashverification only applies to HTTP/HTTPS URIs. IPFS/Arweave are content-addressed.
Level 5: Status Fields
| Issue | Warning Code |
|---|---|
active: false | agent_not_active |
| Invalid boolean values | invalid_boolean_active |
Production Statistics
Data Source: 8004scan Production (Base Sepolia), December 2025, 4,725 agents
URI Format Distribution
- IPFS: 51% (
ipfs://CID) - HTTP(S): 22% (
https://...) - Data URI (base64): 18% (
data:application/json;base64,...) - Other: 9% (plain JSON, Arweave, etc.)
Endpoint Adoption
- A2A: 85% of agents
- MCP: 75% of agents
- OASF: 60% of agents
- agentWallet: 45% of agents
Parse Success Rates
- Perfect (no warnings): 58.1%
- Success with warnings: 11.8%
- Total success: 69.9%
- Common issues: Empty endpoints (~15%), null agentId (~30%)
Top OASF Skills
communication_skills/content_creation(18%)analytical_skills/data_analysis(15%)technical_skills/software_development(12%)problem_solving/strategic_thinking(10%)specialized_skills/blockchain_expertise(8%)
Resources
- Agent Metadata Standard - Full schema specification
- CAIP-2 Specification - Blockchain ID format
- CAIP-10 Specification - Account address format
- OASF Taxonomy - Skills and domains reference