This entire blog post started out as a response to somebody on Reddit. As any well intentioned, concise response starts out... I started to spiral, and ended up writing a thesis about Sanity AEO/SEO. This is that guide, and this is how you can provide the best possible AEO/SEO for your websites with Sanity.
Loading video player...
This post is the opposite of unopinionated. It's the AEO/SEO default we ship on every Sanity project at Roboto, distilled into one place. AEO (answer engine optimisation) and SEO live on the same plumbing, so we treat them as one job: same fields, same JSON-LD, same sitemap, with a couple of extra agent-facing surfaces layered on top. If you want the canonical reference, go and look at turbo-start-sanity on GitHub. Three files do most of the work: packages/sanity/src/query.ts, apps/studio/utils/seo-fields.ts, and apps/web/src/lib/seo.ts. Copy from those and you'll have a stronger AEO/SEO baseline than most headless sites.
What follows is the why behind those files, plus the bits that go beyond what fits in a starter repo: content negotiation for agents, llms.txt and sitemap.md, accessibility as a ranking and citation signal, and the feedback loop that stops the whole thing rotting six months after launch. This is the successor to our 2023 post on SEO tips for Sanity and Next.js and complements our wider takes on headless CMS SEO and why AEO is mostly SEO with content negotiation.
Why Sanity AEO/SEO feels like a mystery
Sanity will let you model anything. A blog, a 50,000-SKU e-commerce catalogue, a directory of train times for a single terminal at Paddington. The flexibility is brilliant, and it's also why every new team building on Sanity asks the same question on day one: where do I put the meta title — and how do I make sure ChatGPT and Perplexity actually cite this thing?
There's a Sanity Learn article on SEO, and a cute guy that wrote it, that covers most information at a basic level. It's worth a read if you've got the time. Most teams don't. What follows is the shorter, more opinionated version. The one we hand to clients and developers who want to ship the AEO/SEO bit and get on with the actual product.
From this point onwards, just trust me bro, because we've built too many of these.
The baseline: title and description on every document
Every document type (page, blog, case study, service, author, whatever) gets two fields at the top: title and description. These are the editorial defaults. They double as more than just SEO:
- The blog
titleis the H1 on the post and the heading on the listing card - The blog
descriptionis the meta description fallback and the card blurb on the index page - The page
titleanddescriptionfollow the same pattern across the page builder
You can layer validation on top of these: minimum length, maximum character count, the little counter underneath the input. We tend not to. Editors find prescriptive character counts annoying, and Google's truncation behaviour shifts often enough that hard limits feel out of date within a year. The fields exist; the editor's responsible for using them sensibly.
The SEO tab: meta overrides with fallbacks
Beneath the baseline, every document gets an SEO tab. Inside it:
seoTitle— overrides the document's title in the<title>tagseoDescription— overrides the description in the meta descriptionseoImage— overrides the social/Open Graph imageseoNoIndex— boolean, adds thenoindexmeta tagseoHideFromLists— boolean, excludes the document from internal listings
The crucial bit: the SEO tab fields are overrides, not requirements. If seoTitle is empty, the renderer falls back to the document's title. If seoDescription is empty, it falls back to description. Editors get a consistent surface across every document type, and they only fill in the SEO tab when they need to differentiate the metadata from the visible content.
That's it. Spread ...seoFields into any document schema and you've got a consistent SEO surface across the whole studio.
Open Graph (and per-network if you really need it)
Open Graph follows the same fallback pattern. ogTitle, ogDescription, ogImage. If they're not set, use the SEO fields. If those aren't set, use the baseline title and description.
If you've got a specific need to differentiate per network (a punchier headline for Twitter, a longer description for LinkedIn), split it into twitterTitle, linkedinTitle, and so on. Most projects don't need it. Default to one Open Graph block that covers every network, and break it out only when a client specifically asks.
JSON-LD: set and forget
If there's one thing I can't recommend enough set and forget as much as you can. As soon as you put JSON-LD into a Sanity Studio, you better hope to god you put a validator in there, because I guarantee it's going to be broke within a week flat.
The mistake we see most often: teams add structured data as a separate editor surface. A textarea for JSON-LD, or a schema picker, or a dedicated tab in the studio. Editors are now responsible for filling in a BlogPosting schema with author, date, image, headline, duplicating what they've already put into the document. It drifts within a sprint.
The Roboto default: derive JSON-LD from existing fields, at render time, in code. Editors enter the data once, in the natural place. The schema generator pulls from those fields and emits the structured data.
The whole thing is one helper function. Parity with what's rendered, zero editor work, no drift when content changes. And it's not difficult to scaffold these days. Point any half-decent AI assistant at your document schema and it'll write the generator in one prompt.
The other side of set-and-forget: pull through everything you've already modelled. If you've got author documents with bios and social profiles, push them into the Person schema. If you've got categories, emit BlogPosting.about. Don't make editors think about it. They've already given you the data.
seoNoIndex and seoHideFromLists: why we keep both
These two fields look similar. They do different things.
seoNoIndexadds<meta name="robots" content="noindex">. The page is still in the sitemap, still linked from internal navigation, but search engines are told not to index it.seoHideFromListsis consumed by your GROQ queries. The page is excluded from the blog index, category pages, related-posts blocks, and anything else that lists documents.
Common scenarios where you need both:
- PPC landing page —
seoNoIndex: trueso Google doesn't rank it organically,seoHideFromLists: trueso it doesn't appear on the main blog index. Only people clicking the ad ever see it. - Archived blog post —
seoHideFromLists: trueso it's quietly demoted off the listings, butseoNoIndex: falseso existing backlinks still hit a valid indexed page. - Author-only draft — both
truewhile the post is being reviewed, both flipped when it's ready to ship.
The GROQ pattern looks like this:
One line. Editors get a toggle, GROQ does the filtering, the renderer respects the meta. Nothing fancy.
Sitemaps: cheap wins
The sitemap is the boring infrastructure that most teams set up once and never touch again. A few rules worth getting right:
- Use
_updatedAtas<lastmod>, not_createdAtor the publish date. Sanity gives you_updatedAtfor free on every document. Pipe it straight through. This is the freshness signal Googlebot actually cares about in 2026. - Generate the sitemap from the same GROQ query that powers the site, so there's only one source of truth. No drift between what's indexable and what's visible.
- Respect
seoNoIndexandseoHideFromLists. Filter them out of the sitemap too. A no-indexed page in your sitemap is a mixed signal. - Optional belt-and-braces: add a
sitemapPriorityfield on the document, default0.7, bump to0.9for cornerstone content. Modern Google largely ignores<priority>and<changefreq>(they've said so publicly), but it costs nothing to ship and gives editors a lever if they want one.
The honest take: <lastmod> is the only sitemap field that genuinely moves the needle today. Get it right and the rest is hygiene.
Last updated and last published: freshness signals for AEO
Show both dates on every blog post, visibly, at the top:
Published: 12 March 2024Updated: 18 May 2026
Sanity gives you both for free: _createdAt (or your custom publishedAt) and _updatedAt. Pull them into the byline, render them in the layout, and wire them into the JSON-LD datePublished and dateModified properties. One source of truth, multiple surfaces, no editor work to keep them in sync.
Why it matters more than it used to:
- Traditional SEO. Google's freshness algorithm rewards recently-updated content for query-deserves-freshness topics. Visible dates reinforce what's in the schema.
- AEO. ChatGPT, Perplexity, Claude, and Google's AI Overviews all weight recency heavily when picking sources to cite. Two posts answering the same question; yours says "updated last month"; yours wins the citation.
- User trust. A 2019 post with no update date reads as stale. A 2019 post updated last month reads as maintained.
The discipline that goes with this: actually update old posts. Refresh the stats, swap stale screenshots, fix broken links, then let _updatedAt do its job. Don't fake the date. LLMs are getting better at detecting stale content dressed up as fresh, and Google's always been able to.
details and summary: free AEO/SEO and accessibility wins
The native HTML disclosure widget is one of the most underused tools in the SEO toolkit. No JavaScript, no library, works in every browser back to 2020.
Why it matters for SEO: content inside <details> is fully crawlable and indexable, even when collapsed. Google reads it. So you can hide long FAQ answers, technical specifications, or "read more" sections without losing keyword surface area.
Why it matters for UX: shorter perceived page length, lower bounce on long-form content, and it's a real focusable, keyboard-accessible element out of the box. No aria-expanded plumbing required.
Where to use it in a Sanity-driven site:
- FAQ blocks. Pair with FAQ JSON-LD and you get rich results and a clean UI.
- Long product spec tables
- "What we did" sections on case studies
- Footnotes and references on blog posts
The trap: don't hide your H1, primary value proposition, or first paragraph inside a <details>. Google indexes the content, but ranks visible-by-default content higher. Use disclosures for supporting material, not core material.
Implementation note: in your Portable Text serializer or MDX components, expose it as a "collapsible section" block. Editors get a button in the studio; developers get semantic HTML in the output.
Content negotiation: serve markdown to agents by default
Vercel published a pattern recently that's worth shipping on every Sanity build: when an agent sends Accept: text/markdown, text/html, */*, return clean markdown instead of the HTML page. Same URL, dramatically smaller payload. Their example: 500KB of HTML compressed to 3KB of markdown. A 99% reduction. We've written about why this matters in more depth in AEO is just SEO with content negotiation.
Why this matters for AEO right now: agents burn tokens on HTML noise. Nav, footers, scripts, classNames, hydration payloads. Hand them markdown and they ingest the actual content. More content per request, cleaner citations, better odds of being picked as a source.
How it works in Next.js:
- A rewrite rule in
next.config.tsdetects the markdownAcceptheader and routes the request to a dedicated markdown endpoint - The route handler returns the body with
Content-Type: text/markdown - Both endpoints pull from the same Sanity GROQ query, so there's no drift between what HTML readers see and what agents consume
Two discovery aids to ship alongside it:
- A
<link rel="alternate" type="text/markdown" href="/llms.txt" />tag in your HTML head for agents that don't send theAcceptheader - A
/sitemap.mdalongside/sitemap.xml, giving agents a structured markdown index they can navigate
Wire this into every project from day one. It's a one-time setup that future-proofs the site as agent traffic keeps growing. Cheap to add now, expensive to retrofit later when half your traffic is bots that can't see your content.
llms.txt and sitemap.md: yes, agents read them
There's a tedious crowd online insisting llms.txt is "made up" or that "no AI reads it." They're wrong.
- What
llms.txtis. A markdown file at the root of your site (/llms.txt) that gives agents a curated, structured index of your most important content. Titles, descriptions, links. Think of it as a hand-picked sitemap optimised for LLM consumption rather than crawler discovery. - What
sitemap.mdis. The same idea but mirroring your fullsitemap.xml. A markdown version of every indexable URL with titles and hierarchy. Agents traverse it instead of parsing XML. - Who reads them. Anthropic's web fetcher consumes them. Perplexity's crawler does. Agents built on the Vercel AI SDK do. ChatGPT's browse tool uses them when you point it at a URL. Every coding agent (Cursor, Claude Code, Codex) pulls them when you point it at docs. The pattern is spreading fast. Stripe, Anthropic, Vercel, Cloudflare, and Sanity itself all ship them.
The cost to ship is one route handler and a GROQ query. The cost of not shipping is invisibility in agent-mediated search at exactly the moment that traffic source is growing fastest. Hell, if it turns out I'm totally wrong, I can delete this section and pretend I never said it.
How to wire it up in Sanity:
/llms.txt— hand-pick your top docs, pages, and guides. Group by section. Render from a dedicated GROQ query so editors can flag content for inclusion with afeaturedForLLMboolean./sitemap.md— derive from the same query assitemap.xml. Same filters, different output format.- Both are static markdown responses. Cheap to serve, cacheable forever, easy to regenerate.
Don't overthink it. A 100-line file beats a 0-line file every time. Ship something, iterate.
Image SEO from the Sanity pipeline
Sanity's image handling is one of the best parts of the stack, and most teams don't use it properly.
Alt text lives on the asset, not the usage. Install sanity-plugin-media. It gives you a proper asset library inside the studio where editors can set alt text, title, tags, and credits on the image itself. Set it once, every usage across every document inherits it. No more wondering whether the editor remembered to write alt text on the card variant. It's hoisted onto the asset.
The compounding win: when you swap a hero image six months later, the new asset already has alt text from the upload. Editors fill it in once, at the source of truth.
A few more rules:
- Require alt text at upload. Wire it as a required field in the plugin config so editors can't physically upload without it.
- Filename hygiene at upload.
hero-blog-seo-best-practices.png, notIMG_3421.png. Google reads filenames as a weak ranking signal. LLMs absolutely do.
The banger combo: Sanity's image pipeline plus next/image. This is the move:
- Sanity's CDN handles transforms.
?w=800&fm=webp&q=80gives you any size, any format, on demand. next/imageconsumes those URLs and emits a propersrcsetfor every breakpoint.- You get responsive images, automatic AVIF/WebP, lazy loading, blur placeholders (from Sanity's LQIP), and zero CLS, all from a single
<SanityImage>component.
Compared to hand-rolling <picture> elements with <source> tags for every breakpoint? Not even close. Write the component once, ship it everywhere, the pipeline does the rest.
priorityon the LCP image (hero, above the fold). Lazy on everything else. Next handles both with a single prop.- Explicit width and height. Sanity gives you the asset dimensions for free via
asset->metadata.dimensions. Pipe them straight intonext/imageto lock the aspect ratio and prevent CLS. - Preload the LCP image in
<head>when it's an above-the-fold hero. Next does this automatically when you setpriority.
Accessibility is AEO/SEO
This deserves its own section because most teams treat it as an afterthought. It isn't.
Google has been candid that accessibility signals feed Core Web Vitals and ranking. Screen-reader-friendly markup is also LLM-friendly markup, since agents parse semantic HTML the same way assistive tech does. So when you optimise for accessibility, you're optimising for traditional SEO and AEO at the same time.
The non-negotiables on every Sanity build:
- Required
alttext at the asset level (covered above). Make it physically impossible to upload without one. - Heading hierarchy is sacrosanct. One H1 per page (the document title). H2 for sections. No skipping levels, no decorative H1s in the page builder. Validate it in Portable Text serializers: strip or downgrade any H1s authored inside rich text.
- Semantic landmarks.
<main>,<nav>,<article>,<aside>,<footer>. Not<div>s withroleattributes. - Focus states: visible, high-contrast. Tailwind's default
focus-visible:ring is fine. Don't disable it for aesthetics. - Colour contrast: 4.5:1 for body text, 3:1 for large text. Build it into the design tokens.
- Skip links at the top of every layout. Keyboard users and screen readers both benefit.
- Form labels: every input wired to a
<label>. Placeholder text is not a label. prefers-reduced-motion: respect it in your Motion for React components. Wrap big animated sections in a conditional that disables motion when the OS preference is set.
The payoff: Lighthouse accessibility scores correlate with rankings, AEO tools weight clean semantic structure when extracting answers for citations, and you stop excluding users who can't navigate poorly-built sites. Every box gets ticked at once.
No spam, only good stuff
Get more Sanity patterns in your inbox
Internal linking discipline
Internal links are the cheapest ranking lever you have. The Roboto default:
- Custom link annotations in Portable Text. Internal links are references, not strings. When a slug changes, the link follows. No 404 link rot, ever.
- Programmatic related-posts from shared categories or tags. Don't ask editors to hand-pick three related links on every post; derive them from the data they've already entered.
- Breadcrumbs derived from URL structure, with
BreadcrumbListJSON-LD attached. - Three internal links per blog post is a sensible baseline. Editors should be linking inside prose, not just on related-content cards. If a post has zero internal links in the body, something's wrong with either the post or the rest of the site.
Canonical URLs
- Self-referencing canonical on every page by default
- An
seoCanonicaloverride field for syndicated content (cross-posted to Medium? Point canonical back to your domain) - Strip query params from canonicals — UTM tags should not dilute your canonical signal
This is a ten-line helper in apps/web/src/lib/seo.ts. Most projects never need to touch the override; it's there for the day a client cross-posts a thought leadership piece.
Redirects as a Sanity document type
When editors change a slug, never leave the old URL dead. The Roboto stack handles this with a redirect document type. Old slug, new slug, automatic.
- The redirect schema is surfaced in the studio sidebar, so editors own it without dev intervention
- The list of redirects is queried at build time and piped into
next.config.ts(or middleware for high-volume sites) - For projects with hundreds of historical redirects, batch them. Don't expand the middleware on every request.
The wider point: SEO equity is fragile. Years of backlinks evaporate the moment a URL 404s. A redirect document type costs about an hour to build and protects the site indefinitely.
Author entities and E-E-A-T
Every author gets a proper Author document type with:
- A bio (used in the byline and on the author archive page)
- A photo (used in the byline and on the author card)
- Social profiles (LinkedIn, Twitter, GitHub, personal site)
- Position and credentials
Why bother:
- Author bylines render a
PersonJSON-LD that Google reads for expertise signals. E-E-A-T is real, and identified authors carry more weight than anonymous content. - Author archive pages (
/author/jono-alford) collect every post, building topical authority and giving Google a clear signal about who writes about what. - LLMs cite authors when they can identify them. Anonymous content gets cited less, especially for opinion-led topics where attribution matters.
This is the same set-and-forget principle from JSON-LD. Editors fill in the author document once. Every post by that author inherits the bio, the photo, the schema, the social links.
Schema beyond BlogPosting
BlogPosting is the obvious one. Don't stop there. On every Sanity build we tend to wire up:
Organizationonce, in the root layoutBreadcrumbListon every nested pageFAQPagewhen a block has FAQs (this post has one, look at the source)HowTofor tutorials with clear stepsProductorServiceon service pagesPersonfor author bylines
All derived from existing Sanity content. Editors never touch schema markup. It's just generated.
Robots and the boring infrastructure
The unglamorous stuff that bites you when it breaks:
robots.txtgenerated from Sanity site config, not hardcoded. That way you can toggle staging vs prod indexing without a deploy: flip a boolean in the studio, redeploy if needed, done.- 404 pages with real internal links (popular posts, search, primary nav). Not a dead end. Genuinely useful 404 pages reduce bounce and recover SEO equity from broken inbound links.
- Never serve 200 OK on "not found" content. Use Next's
notFound()properly. Soft 404s are one of the worst ranking signals you can send and Google is increasingly aggressive about flagging them.
Open Graph image fallback generator
Editors are inconsistent about uploading social images. The fix: give them a default the moment the document is created, and generate one at render time when the slot is empty.
Default at document creation. Sanity's initialValue on the schema lets you pre-populate the ogImage field with a templated asset when a new document is created. Editors open a fresh blog post and there's already a usable OG image sitting in the slot. Most won't touch it; the few who care will override it. Either way you've removed the empty state.
Render-time fallback. Use next/og to auto-generate a branded OG image from the document title when ogImage is still empty (or when you want a per-post variant). Brand-consistent template, always fresh, zero editor burden. Cache it aggressively — the URL is deterministic from the slug, so it never needs to regenerate after the first request.
The same set-and-forget principle, applied to social previews. Editors who care can override; editors who don't get something usable by default, both at creation time and at render time.
AEO/SEO sanitisation: catch the drift before it eats you
Here's what happens on every codebase, including ones built by people who know better. You ship fast. You vibe-code a few sections. You let a junior developer wire up a new page builder block. You copy-paste a layout. Six months later your site is quietly haemorrhaging SEO equity. Heading hierarchy is broken on three templates. Half your pages have duplicate H1s. The new "featured grid" component renders product titles as <div> because someone forgot. Meta descriptions are getting truncated. Four hundred pages share the same og:image.
You won't catch this by eyeballing the site. You need a feedback loop.
Run a crawl weekly. Screaming Frog or Sitebulb against prod. Export the issues, triage the top ten. Every Roboto client site has one of these running, and it surfaces things humans miss every single time: orphan pages, broken internal links, missing alt text, oversized HTML, redirect chains, duplicate titles. The licence pays for itself in one engagement.
Wire it into Linear. When the crawl finds issues, dump them into tickets with an seo-debt label. Treat them like bugs, not "nice to have."
CI-level guards for the cheap ones:
- Lint rule: no
<h1>inside Portable Text serializers (only the page title gets H1) - Test: every page returns a
<title>,<meta name="description">,<link rel="canonical">, and anog:image - Test: no
console.login production, no straynoindexin prod headers (you'd be amazed how often this ships) - Test:
sitemap.xmlreturns 200 and contains every published document
Lighthouse CI on every PR. Fail the build if accessibility drops below 95 or SEO drops below 100. Don't let regressions land. Catch them at the PR, not after deploy.
Vibe coding is fine. Vibe coding without guardrails is what kills you.
Shipping fast is genuinely good. Half the SEO patterns in this post exist because we vibed our way through a problem on a client project and codified what worked. The fix isn't to slow down, it's to set up validation that runs at the speed you're shipping.
The pattern we run on most projects now:
- Wire up an agent that audits your site continuously. Vercel AI SDK makes this trivially cheap to build — a scheduled function, a couple of tool calls (fetch the page, parse the head, check the schema), and a structured output that flags regressions. We've got one running against robotostudio.com on a cron, and it catches things humans miss between deploys.
- Validation at the schema level. If a page builder block requires a heading, a Zod schema or a Sanity validation rule should reject it without one. Don't let bad data ship and then fix it downstream.
- Validation at the PR level. Lighthouse CI, a Screaming Frog programmatic crawl, or an AI SDK script that diffs the rendered HTML against last week's. Anything that fails the build before merge.
- Validation at the runtime level. Log SEO-critical fields to PostHog or Sentry. If a page renders without a
<title>or with a duplicate H1, you want to know about it the moment it happens, not next quarter.
The honest take: every shortcut you take to ship (the inline component, the hardcoded string, the "I'll fix the schema later") compounds. SEO debt looks invisible until you check Search Console and realise you've been bleeding impressions for two months. AI-driven validation is the cheapest way to keep moving fast without paying for it later.
What "good" looks like: zero critical Screaming Frog issues, Lighthouse SEO 100 on every template, an AI audit agent that catches drift before you do. If your manual audit surfaces surprises, your validation is broken. Fix the validation first, not just the surprises.
Where to copy this from
The whole pattern is open source. Go to turbo-start-sanity on GitHub. Open these three files:
- packages/sanity/src/query.ts — the GROQ queries with SEO filtering baked in
- apps/studio/utils/seo-fields.ts — the reusable SEO fields object
- apps/web/src/lib/seo.ts — the metadata generator and JSON-LD helpers
Copy them, adapt them, ship them. That's the entire AEO/SEO baseline for a Sanity project. Everything else in this post (content negotiation, llms.txt, accessibility, the sanitisation loop) is incremental. The three files above are the foundation.
If you're starting a new Sanity build, start there. If you've inherited an existing build and the AEO/SEO feels patchy, those three files are your refactor target.




