tokens
Just to parse raw HTML — most of it navigation, ads, boilerplate. The actual content is buried.
robots.txt taught crawlers where to go. sitemap.xml told search engines what exists. ARA tells AI agents what a site is, what it contains, and how to interact with it — in 300 tokens instead of 50,000.
{ // ARA v1.0 — agent discovery "ara_version": "1.0", "identity": { "name": "TechShop", "type": "ecommerce", "languages": ["en", "fr"] }, "resources": { "products": "/schemas/product", "orders": "/schemas/order" }, "actions": "/.well-known/ara/actions.json", "protocols": ["REST", "MCP", "A2A"], "policies": { "rate_limit": "60/min", "auth": "bearer" } }
Every AI agent visiting your site wastes thousands of tokens parsing HTML noise. There's no standard. Every agent guesses.
Just to parse raw HTML — most of it navigation, ads, boilerplate. The actual content is buried.
Average information retrieval from HTML parsing. Agents miss facts, misunderstand structure, hallucinate capabilities.
Screenshot analysis, DOM scraping, UI automation — all brittle. One site redesign breaks every agent integration.
Existing standards each solve a slice. ARA is the only one designed end-to-end for AI agent interaction.
| robots.txt | sitemap.xml | Schema.org | llms.txt | OpenAPI | ARA | |
|---|---|---|---|---|---|---|
| Site discovery | — | Partial | — | Partial | — | ✓ Complete |
| Global overview | — | URLs only | — | Plain text | — | ✓ Structured |
| Data schemas | — | — | Fragmented | — | Yes | ✓ Semantic |
| Actions | — | — | Limited | — | Yes | ✓ Multi-protocol |
| Intent mapping | — | — | — | — | — | ✓ Native |
| MCP / A2A support | — | — | — | — | — | ✓ Native |
| LLM-optimized digest | — | — | — | Basic | — | ✓ Optimized |
| Agent policies | Basic | — | — | — | Partial | ✓ Complete |
llms.txt is a plain text file with links. Useful as a first step — but it gives AI agents no structure, no schemas, no actions, and no machine-readable policies.
| Feature | llms.txt | ARA |
|---|---|---|
| Format | Plain text / markdown links | Structured JSON |
| Site overview | Partial (manual) | Complete (structured) |
| Data schemas | — | JSON Schema + Schema.org |
| Available actions | — | Full query & mutation definitions |
| Protocol support | — | REST, MCP, A2A, GraphQL |
| Access policies | — | Rate limits, auth, data usage |
| Token cost for agents | ~800 tokens to parse | ~150 tokens (manifest only) |
| Machine-readable | Partially | Fully |
/ara migrate to convert your existing llms.txt into full ARA files automatically.
One HTTP GET to manifest.json gives an AI agent complete understanding of your site — identity, structure, schemas, actions, and policies.
manifest.json
~150 tokens
Identity, content map, capabilities, protocols, policies. The single entry point an agent fetches first.
"identity": { "name": "TechShop", "type": "ecommerce" }
schemas/
~250 tokens
Semantic resource schemas with Schema.org annotations — agents understand your data types without inferring them.
"products": { "type": "catalog", "count": 2000 }
actions.json
~350 tokens
Agent actions with natural-language intent examples — agents know what they can do and how to call it.
"search_products": { "intent": "find products by..." }
digest.md
~300 tokens
LLM-optimized 200–400 token summary — AI search engines cite your facts, not their guesses.
# TechShop - 2,000 products across 14 categories - Free EU shipping above €50 - 30-day returns, 2-yr warranty
GPTBot, ClaudeBot, PerplexityBot, Google-Extended and 10 other AI crawlers don't know ARA exists yet. We solved this with server-side content negotiation.
/.well-known/ara/digest.mdLink: </.well-known/ara/manifest.json>; rel="ara-manifest" on every response
<link rel="ara-manifest"> + <meta name="ara:manifest"> tags
potentialAction pointing to manifest (Schema.org-compatible)
Whether you use an AI editor or prefer the command line — ARA can be set up either way.
If you use Claude Code, Cursor, Opencode, or another AI-powered editor, dedicated ARA agents automate the entire setup in minutes — audit, generate, enforce, monitor.
/ara transform https://yoursite.com — generates all 4 ARA files
/ara enforce https://yoursite.com — injects middleware automatically
/ara audit https://yoursite.com — verifies your grade (A–F)
If you don't use an AI editor, you can still implement ARA manually. Use npx to generate a starting template, then review and complete the generated files by hand.
mkdir -p .well-known/ara — create the directory
npx ara-generate https://yoursite.com --output .well-known/ara/ — generates a template
npx ara-validate https://yoursite.com — check your score
The fastest way to go ARA-ready — 4 Claude Code agents automate the full lifecycle (audit → generate → enforce → monitor) in minutes. Currently available for Claude Code only — Cursor and Opencode agents are in development.
Scores any site A–F across 13 criteria — finds missing layers, detects llms.txt, checks enforcement signals.
Generates all 4 ARA files from any URL or local codebase — manifest, schemas, actions, digest.md.
Injects content-negotiation middleware for 8+ frameworks — forces AI bots to read ARA.
Measures GEO impact — tracks citation rate and semantic accuracy across AI search engines.
claude mcp install ara
or
copy agents from
github.com/aka9871/ara-agents
to your ~/.claude/agents/ directory
ARA is designed to be adopted progressively. Start with the manifest, add layers when you're ready.
manifest.jsonschemas/actions.jsonThree commands. No DNS changes. No deploys. Works with any stack.
Run the official ARA validator against any URL — instant grade with detailed breakdown.
Open validator →Full v1.0 specification, JSON schemas, examples, and reference implementations.
Read on GitHub →