Agentic AI Optimization: Implementation Checklist

· updated
aiagentswebseouxtutorial

If AI crawlers can’t read your pages and browser agents can’t operate your UI, your site is invisible or broken in the workflows that matter next. But this isn’t just “AI SEO.” It’s the overlap of traditional SEO, static truth in the initial response, machine-readable content surfaces, and UX ergonomics for agents and operators using harnesses like pi, OpenCode, and Claude Code.

If you want the public skills that cover the adjacent discipline while you work, install them first:

npx skills add -y -g coreyhaines31/marketingskills --skill seo-audit
npx skills add -y -g sanity-io/agent-toolkit --skill seo-aeo-best-practices

If you want the joelclaw house skill that distills this into one opinionated playbook, install it from the joelclaw repo checkout:

npx skills add -y -g /Users/joel/Code/joelhooks/joelclaw --skill agent-discovery

Keep these docs open while you work:

1. Ship an explicit crawl policy

  1. Decide which classes of bots you want to allow:
    • Search/indexing: OAI-SearchBot, Googlebot, Bingbot, Claude-SearchBot, PerplexityBot
    • User-triggered fetchers: ChatGPT-User, Claude-User, Perplexity-User
    • Training: GPTBot, Google-Extended, ClaudeBot
  2. Allow search/indexing if you want to appear in AI answers.
  3. Allow or block training separately. OpenAI, Google, and Anthropic now expose separate controls.
  4. Add a sitemap.
  5. Use noindex on URLs that must never surface.
  6. Do not treat robots.txt as security. Anything private needs auth.
  7. Do not confuse llms.txt with discoverability. Google explicitly says you do not need new AI text files or special AI markup for AI Overviews or AI Mode. In joelclaw, we still publish llms.txt, but mostly as a low-traffic hint surface for operators and harnesses already at the site. The higher-value machine surface is explicit markdown twins plus sitemap.md, not llms.txt alone.

Example robots.txt that allows search and user-triggered retrieval, but blocks model-training crawlers:

User-agent: OAI-SearchBot
Allow: /
 
User-agent: ChatGPT-User
Allow: /
 
User-agent: GPTBot
Disallow: /
 
User-agent: Googlebot
Allow: /
 
User-agent: Google-Extended
Disallow: /
 
User-agent: Bingbot
Allow: /
 
User-agent: Claude-SearchBot
Allow: /
 
User-agent: Claude-User
Allow: /
 
User-agent: ClaudeBot
Disallow: /
 
User-agent: PerplexityBot
Allow: /
 
User-agent: Perplexity-User
Allow: /
 
Sitemap: https://example.com/sitemap.xml

Example noindex for pages that must stay out of AI and search surfaces:

<meta name="robots" content="noindex">

Verification

  1. Fetch the file directly:

    curl -s https://example.com/robots.txt
  2. Smoke-test bot responses:

    curl -I -A 'OAI-SearchBot/1.3' https://example.com/
    curl -I -A 'Googlebot' https://example.com/
    curl -I -A 'Claude-SearchBot' https://example.com/
  3. In Google Search Console, confirm pages are indexable.

  4. In Bing Webmaster Tools, verify crawlability and the AI Performance Report.

  5. If a page must not appear, confirm the rendered HTML contains noindex.

2. Put critical content in the initial HTML

  1. Server-render or pre-render every page you want cited: docs, tutorials, product pages, pricing, policy pages, FAQs.
  2. Make sure the initial response HTML contains the facts that matter: title, H1, summary, specs, price, availability, refund policy, contact info, comparison data.
  3. Do not ship a blank SPA shell and hope the crawler executes JavaScript.
  4. Do not put core facts only behind tabs, accordions, modals, or client-side fetches.
  5. Do not put core facts only in PDFs or images. Put them in HTML.
  6. Use real links with href. Do not make navigation depend on div + onClick.
  7. Prefer static rendering or a cached shell for any facts you want cited. Small dynamic holes are fine, but the canonical answer should already be in the first response.

Pattern:

<article>
  <header>
    <h1>API rate limits</h1>
    <p>Starter allows 100 requests per minute. Pro allows 1,000.</p>
  </header>
 
  <section aria-labelledby="limits-by-plan">
    <h2 id="limits-by-plan">Rate limits by plan</h2>
    <table>
      <thead>
        <tr><th>Plan</th><th>Requests per minute</th></tr>
      </thead>
      <tbody>
        <tr><td>Starter</td><td>100</td></tr>
        <tr><td>Pro</td><td>1000</td></tr>
      </tbody>
    </table>
  </section>
</article>

Verification

  1. Fetch the raw HTML and confirm the important facts are present:

    curl -sL https://example.com/api-rate-limits | rg -n 'API rate limits|100 requests per minute|<table>'
  2. Load the page in a text browser:

    lynx -dump https://example.com/api-rate-limits
  3. Disable JavaScript in the browser once. If the page loses the facts, fix the rendering strategy.

  4. If curl cannot see the content, many AI systems will not see it either.

3. Rewrite pages into extractable fragments

  1. Align <title>, meta description, and <h1> so they describe the same thing.
  2. Break each page into sections with descriptive h2 and h3 headings.
  3. Front-load the answer in the first sentence under each heading.
  4. Write sections so they still make sense when copied out of context.
  5. Use Q&A blocks, numbered steps, bullets, and tables for facts that should be cited.
  6. Replace vague adjectives with measurable claims.
  7. Put the canonical answer in one place. Duplication creates conflict.

Pattern:

<section aria-labelledby="refund-policy">
  <h2 id="refund-policy">What is your refund policy?</h2>
  <p>We issue full refunds within 30 days of purchase.</p>
  <ul>
    <li>Applies to monthly and annual plans.</li>
    <li>Email support@example.com with your order ID.</li>
    <li>Refunds return to the original payment method.</li>
  </ul>
</section>

Use this checklist when rewriting content:

  1. One heading = one idea.
  2. One paragraph = one claim.
  3. One list = one comparison or sequence.
  4. No decorative symbols.
  5. No marketing filler where a number or constraint should be.

Verification

  1. Copy one section into a plain text file. If it no longer makes sense, rewrite it.
  2. Check pages in a text browser. Lists and tables should still read cleanly.
  3. Prompt major AI tools with the exact question the page answers. If they miss the answer, make the section shorter, clearer, and more direct.

4. Add JSON-LD that matches the visible page

Use schema to label what the page is. Do not invent entities. Do not add fields that are not visible or true. There is no special AI-only schema to unlock this. Use existing Schema.org types correctly.

Also add visible trust signals on the page itself when they matter: clear bylines, updated dates, author credentials or context, and links to original sources or evidence. Schema can reinforce those signals. It cannot fake them.

Page typeSchema typesMinimum fields
Site-wideOrganization, WebSitename, url, logo
Article/tutorial/doc pageArticle or BlogPosting, WebPage, BreadcrumbListheadline, description, datePublished, dateModified, author, publisher, mainEntityOfPage
FAQ pageFAQPagemainEntity[] with Question + acceptedAnswer
How-to pageHowToname, step[]
Product pageProduct, nested Offer, optional AggregateRating and Reviewname, description, image, sku, brand, offers.price, offers.priceCurrency, offers.availability, seller
NavigationBreadcrumbListitemListElement[]

Article/tutorial example:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Agentic AI Optimization: Implementation Checklist",
  "description": "Implementation-first checklist for making a site crawlable, citable, and usable by AI agents.",
  "datePublished": "2026-03-09T00:00:00.000Z",
  "dateModified": "2026-03-09T00:00:00.000Z",
  "author": {
    "@type": "Person",
    "name": "Joel Hooks"
  },
  "publisher": {
    "@type": "Organization",
    "name": "joelclaw"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://joelclaw.com/aaio-implementation-checklist"
  }
}
</script>

FAQ example:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "How do I block model training but allow AI search?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Allow search bots like OAI-SearchBot and Googlebot, but disallow GPTBot, Google-Extended, and ClaudeBot if that matches your policy."
      }
    }
  ]
}
</script>

Product example:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Acme Noise-Canceling Headphones",
  "description": "Wireless over-ear headphones with 40-hour battery life.",
  "image": ["https://example.com/images/headphones.jpg"],
  "sku": "ACME-NC-001",
  "brand": { "@type": "Brand", "name": "Acme" },
  "offers": {
    "@type": "Offer",
    "price": "199.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock",
    "seller": { "@type": "Organization", "name": "Acme" }
  }
}
</script>

Verification

  1. Confirm the JSON-LD exists in the HTML:

    curl -sL https://example.com/page | rg 'application/ld\+json'
  2. Run the page through:

  3. Check that every schema value matches visible page content exactly.

  4. Re-run validation whenever price, availability, or copy changes.

5. Build the accessibility tree on purpose

OpenAI says Atlas uses ARIA roles, labels, and states to interpret pages. Microsoft’s browser automation stack and many agent frameworks also lean on the accessibility tree. That means accessibility work is agent-interface work.

  1. Use native elements first: <button>, <a>, <input>, <select>, <table>, <form>.
  2. Label every form control.
  3. Add autocomplete values anywhere a browser or agent fills user data.
  4. Use landmarks: <header>, <nav>, <main>, <aside>, <footer>.
  5. Keep heading hierarchy logical.
  6. Use descriptive link text.
  7. Add ARIA only where native HTML is not enough.
  8. When state changes in the UI, expose the state.

Good patterns:

<button type="submit">Search flights</button>
 
<a href="/pricing">Pricing</a>
 
<label for="email">Work email</label>
<input id="email" name="email" type="email" autocomplete="email" required>
 
<nav aria-label="Main navigation">
  <ul>
    <li><a href="/docs">Docs</a></li>
    <li><a href="/pricing">Pricing</a></li>
  </ul>
</nav>
 
<button aria-expanded="false" aria-controls="filters-panel">Filters</button>
<div id="filters-panel" hidden>
  <!-- filter controls -->
</div>
 
<p role="status" aria-live="polite">3 results loaded</p>

Bad patterns:

<div class="button" onclick="checkout()">Buy now</div>
<input type="text" placeholder="Email">
<div onclick="location.href='/pricing'">See pricing</div>

Verification

  1. Run a screen reader through the critical flow: homepage, pricing, signup, checkout, contact.

  2. Inspect the accessibility tree in browser devtools.

  3. Use automated checks, but do not stop there:

    • Lighthouse accessibility audit
    • axe DevTools
  4. Add end-to-end tests that use accessible selectors instead of CSS selectors. For interactive flows, also run agent-browser or Playwright against the page so the same accessible affordances get exercised by an actual browser agent:

    await expect(page.getByRole('button', { name: 'Search flights' })).toBeVisible();
    await expect(page.getByLabel('Work email')).toBeVisible();
    await page.getByRole('link', { name: 'Pricing' }).click();
  5. If the test cannot find an element by role, label, or name, the agent will probably struggle too.

6. Expose machine interfaces and operator-friendly paths

This is where traditional SEO stops being enough. A page can be indexable and still suck for agents. If an operator using pi, OpenCode, or Claude Code has to scrape HTML, guess MIME types, or improvise the next step, the UX is still broken.

Fix the basics first.

Don’t start by bolting MCP onto a product that still forces agents to scrape HTML, guess MIME types, and improvise the next step. If your routes lie, your headers are wrong, and your JSON is just a blob, a new protocol won’t save you.

The boring baseline is what matters: one canonical source of content, truthful projections, explicit discovery surfaces, and machine interfaces that tell the harness what to do next. That’s the pattern in joelclaw.

One canonical resource, multiple truthful projections

A good machine interface lets the same resource show up in different formats without inventing different truths.

joelclaw takes the explicit-route path today: .md twins plus JSON APIs, all projected from one canonical resource in Convex. The rule is simple: HTML, markdown, and JSON should be projections of one resource, not three separately maintained documents.

That’s the core move:

  • Accept: text/html or /page → human page
  • /page.md → agent markdown
  • /api/... → structured JSON
  • one canonical source underneath all of it

If you need structured data, project JSON from the same source object instead of hand-maintaining parallel copies. If you need markdown, give it a first-class route with the right Content-Type.

Cheap discovery surfaces beat guesswork

Don’t make agents reverse-engineer your site. Give them obvious starting points.

joelclaw already does this in a very practical way:

  • robots.txt advertises both sitemap.xml and sitemap.md
  • sitemap.md lists human URLs, feeds, ADRs, and markdown twins
  • llms.txt points to the markdown sitemap, feed, and markdown access pattern
  • /api acts as a discovery route for the structured interfaces
  • HTML pages include a source-visible clawmail-agent-prompt marker telling agents which endpoints to hit and which Content-Type values to verify

This is not about ranking hacks. It’s about reducing token waste and route ambiguity.

Markdown routes should be first-class

Markdown shouldn’t be an export button buried in a UI. Give it a route.

joelclaw treats /slug.md as a first-class endpoint and rewrites it to the route handler that renders agent markdown:

// apps/web/proxy.ts
const mdMatch = pathname.match(/^\/([\w-]+)\.md$/);
if (mdMatch) {
  const url = request.nextUrl.clone();
  url.pathname = `/${mdMatch[1]}/md`;
  return NextResponse.rewrite(url);
}

The route itself returns real markdown with the right MIME type:

// apps/web/app/[slug]/md/route.ts
return new Response(preamble + header + cleaned, {
  headers: {
    "Content-Type": "text/markdown; charset=utf-8",
    "Cache-Control": "s-maxage=3600, stale-while-revalidate",
  },
});

joelclaw also makes discovery explicit. sitemap.md lists the machine-readable twins:

// apps/web/app/sitemap.md/route.ts
"## Agent Markdown Exports",
"",
"Append `.md` for agent markdown with preamble + implementation details:",
...posts.map((p) => `- [${p.title}](${SITE_URL}/${p.slug}.md)`),

And when markdown pages link internally, joelclaw rewrites those links to other .md endpoints so an agent can stay in the cheap path instead of bouncing back into HTML.

Use the three-surface pattern: /page for humans, /page.md for agents, and /api/content/page or equivalent JSON for structured clients. Same resource. Different projections. No drift.

On MIME types, be explicit:

  • HTML: text/html; charset=utf-8
  • Markdown: text/markdown; charset=utf-8
  • JSON: application/json; charset=utf-8
  • Text hint surfaces like llms.txt: text/plain; charset=utf-8

If a markdown endpoint returns text/html, treat that as a bug.

Operator UX: the harness needs a path

Don’t return raw JSON and make the agent guess what to do next. Return the result and the next move.

joelclaw’s CLI bakes this in as a hard contract:

// packages/cli/src/response.ts
export interface JoelclawEnvelope {
  readonly ok: boolean
  readonly command: string
  readonly result: unknown
  readonly error?: { message: string; code: string }
  readonly fix?: string
  readonly next_actions: readonly NextAction[]
}

That means joelclaw status doesn’t just answer the question. It also points to joelclaw runs and joelclaw run <run-id>. The interface is self-describing.

The same idea shows up on the web side. The root API discovery route returns commands an agent can run immediately:

// apps/web/app/api/route.ts
nextActions: [
  {
    command: `curl -sS "${origin}/api/search"`,
    description: "Search API discovery (sample queries, auth details)",
  },
  {
    command: `curl -sS "${origin}/api/docs"`,
    description: "Docs API discovery (books, PDFs)",
  },
]

If an agent hits your JSON and can’t tell what to do next, your interface is unfinished.

Also give the operator obvious affordances. joelclaw’s CopyAsPrompt button is a small example, but it’s the right instinct: make it easy to move a page into a harness without manual cleanup.

For coding surfaces: AGENTS.md first, skills second

The practical lesson here is simple: passive, always-on repo context in AGENTS.md usually beats optional context that a harness has to discover later. Skills still matter, but they’re better as deeper task-specific follow-ons than as the first breadcrumb.

The lesson isn’t “skills are useless.” The lesson is that passive, always-on context beats hoping the agent decides to load the right skill at the right moment.

For repos and coding surfaces, that means:

  • put the always-needed rules, paths, commands, and retrieval hints in AGENTS.md
  • use skills for deeper, task-specific workflows once the agent already knows where it is

That’s especially relevant for operators working through pi, OpenCode, and Claude Code.

A practical AGENTS.md snippet

This doesn’t need to be a novella. It needs to give the harness a truthful starting point. Something like this is enough:

 
 
## Agent retrieval hints
 
- Prefer retrieval-led reasoning over pretrained guesses for framework and product-specific work.
- Start with `/api`, `sitemap.md`, and any `/.md` content twins before scraping rendered HTML.
- Verify `Content-Type` before parsing: markdown must be `text/markdown`, JSON must be `application/json`, text hints must be `text/plain`.
- Treat HTML, markdown, and JSON as projections of the same resource; if they conflict, stop and find the canonical source.
 
## Operator path
 
- For machine-readable content, try `{page}.md` first.
- For structured discovery, use `/api`.
- For broad site discovery, use `/sitemap.md`.
- If the route falls back to HTML, retry with the right path or `Accept` header instead of guessing.

Tight, boring, and always present beats a clever skill the agent may never invoke.

Protocols to layer on after the basics

Once those basics are solid, then layer protocols on top.

  • MCP — expose a curated tool layer over canonical contracts
  • NLWeb — useful for public knowledge retrieval once the core routes already work
  • A2A — valuable when your action surface is stable enough to advertise capabilities well
  • AGENTS.md — still the cheapest high-leverage interface for coding agents

Don’t use MCP to compensate for a site that still has no markdown route, no Content-Type discipline, and no JSON affordances.

Verification

Don’t trust the implementation because the route file exists. Verify behavior.

# joelclaw: explicit markdown twin
curl -I https://joelclaw.com/sitemap.md
curl -I https://joelclaw.com/aaio-implementation-checklist.md
# expect: Content-Type: text/markdown; charset=utf-8
 
# joelclaw: API discovery advertises next actions
curl -sS https://joelclaw.com/api | jq '.nextActions'
 
# joelclaw: text hint surface is plain text
curl -I https://joelclaw.com/llms.txt
# expect: Content-Type: text/plain; charset=utf-8
 
# joelclaw CLI: command output is an agent envelope
joelclaw status | jq '.next_actions'

This is already baked into the stack:

  • joelclaw documents the expected Content-Type checks in docs/web.md and in the source marker injected by apps/web/components/clawmail-source-comment.tsx.

That’s what “expose machine interfaces” means in practice.

Not a standards slide.

A site an agent can actually use.

7. If you sell products, make checkout agent-ready

This is optional for non-commerce sites. It is not optional if you expect AI agents to research, compare, and buy on behalf of users.

  1. Clean your product catalog:
    • precise titles
    • real descriptions
    • current stock
    • current prices
    • stable SKUs
    • descriptive image alt text
  2. Add Product + Offer schema to every product page.
  3. Put price, availability, shipping constraints, and return policy in visible HTML.
  4. If you are on Shopify, check whether Agentic Storefronts are already available for your store.
  5. If you are on Stripe, evaluate the Agentic Commerce Protocol and Agentic Commerce Suite.
  6. If you implement ACP directly, you need these endpoints:
    • Create Checkout
    • Update Checkout
    • Complete Checkout
    • Cancel Checkout
  7. If you implement UCP, publish /.well-known/ucp and keep the capability declaration accurate.

Verification

  1. Ask ChatGPT, Google AI Mode, Copilot, Claude, and Perplexity for the exact product category you sell.
  2. Confirm your products appear with correct price and availability.
  3. Fetch the product page HTML and verify the price and stock are in the initial response.
  4. Validate product JSON-LD after every catalog update.

8. Instrument measurement and regression checks

  1. Track AI referrals separately from classic search and direct traffic.
  2. Capture utm_source=chatgpt.com explicitly in analytics.
  3. Segment traffic from Google Search, Bing/Copilot, Perplexity, and Claude where referrer data exists.
  4. Log bot hits by user-agent.
  5. Track citation presence and community mentions for the concepts you want to own.
  6. Review important pages on a refresh cadence. Thirty, ninety, and one-eighty days is fine.
  7. Add regression tests for the pages and flows that matter most.

Useful checks:

# AI crawler traffic in server logs
rg 'OAI-SearchBot|Googlebot|Bingbot|Claude-SearchBot|PerplexityBot' /var/log/nginx/access.log
 
# Make sure important pages still expose schema
curl -sL https://example.com/pricing | rg 'application/ld\+json'
 
# Quick text-only smoke test
lynx -dump https://example.com/pricing

Track these KPIs:

  1. AI referral sessions
  2. AI-assisted conversion rate
  3. Citation presence for core queries
  4. Community mentions and repeated phrasing in the wild
  5. Task completion rate for agent-driven flows
  6. Crawl success and error rate by bot

Definition of done

Your site is in decent AAIO shape when all of this is true:

  1. Search bots you want are allowed.
  2. Training bots you do not want are blocked.
  3. Critical facts are present in raw HTML.
  4. Pages use descriptive headings, direct answers, lists, and tables.
  5. JSON-LD exists and validates.
  6. Buttons, links, forms, and state changes are exposed in the accessibility tree.
  7. Markdown, text, or JSON discovery surfaces exist for the same canonical resources.
  8. Coding surfaces expose persistent operator context through AGENTS.md, not just optional skill triggers.
  9. Product pages expose machine-readable price and availability if you sell things.
  10. Analytics and logs tell you which bots and AI surfaces are actually sending traffic.

That’s the work. The rest is iteration.