Skip to content




Next.js AEO/GEO/SEO whatever you want to call it: guide

Next.js AEO/GEO/SEO whatever you want to call it: guide

AEO, GEO, or just SEO: the playbook is the same. Metadata, JSON-LD, sitemaps, and content negotiation, implemented on a production Next.js site.


God I'm tired. If I hear one more "SEO is dead, AEO is the future", I'm going to lose my mind. I want you to know, going in, I'm only going to tell you the things that have moved the needle for us. Not some elaborate ruse to sell you a course. But before we start...

What should we call it

  • AEO (answer engine optimization). Getting picked as the source when ChatGPT, Claude, or Perplexity answers a question. The term grew out of the SEO industry and shows up most in developer circles.
  • GEO (generative engine optimization). Same goal, different lineage. The name comes from a 2023 academic paper and got adopted by marketing teams. If your client reads marketing blogs, they'll say GEO.
  • LLMO, AI SEO, LLM SEO. Also in circulation. Also the same thing.

Dom Sipowicz, a forward deployed engineer at Vercel, has been keeping score since 2025:

That being said, I do like his definitions here, and I'll be using these from here on out.

You'll notice, even he finds it difficult to standardise = and -, so god knows how we'll handle these acronyms.

The tactics underneath are identical. Structured content, accurate metadata, clean markup, content agents can quote. If a consultant tells you GEO needs a fundamentally different strategy from AEO, they're selling you the same audit twice. We run a generative engine optimisation service and it's definitely not to capitalise on an emerging keyword.

Anyway, I digress.

What's actually new

Agents are a new class of visitor. They:

  • Sometimes can't run JavaScript reliably
  • Choke on ads, navigation, footers, and cookie banners
  • Have small context windows, so wasted tokens cost real money
  • Prefer structured text they can quote verbatim
Merj slide titled 'Examples of where agents fail': drag and drop, multi-step or multi-page forms, overlapping layers, layout shifts, HTML canvas, and UI virtualisation
Slide from Merj's talk 'From the lab: how AI agents interact with websites' at the recent Compound event

It's the best one-screen summary of the problem I've seen. Merj's lab testing found agents fail on exactly the patterns modern frontends love: drag and drop, multi-step forms, overlapping layers, layout shifts, canvas rendering, and UI virtualisation. In every case the content exists, the agent just can't reach it. If your pricing lives in a canvas-rendered table or your nav is virtualised, you're invisible to agents no matter how good the words are.

One thing we've observed though: they absolutely love markdown if they accept it. So much so that we had a 10k uptick of visitors from GPTBot in the space of 5 mins. I'm not joking. Hold that thought, because what a spike like that does to your bill gets its own section further down.

The SEO half (nothing changed)

Before any AEO plumbing, do the SEO that's been working since the Panda update. Quick summary, since you've probably read this before:

  • Titles target the actual query, sentence case, under 60 characters
  • Meta descriptions are written for humans and include the keyword
  • Schema markup where it earns rich results: BlogPosting, FAQPage, Service
  • Internal links from blog posts to relevant service pages (like this one), not just topic clusters
  • Real backlinks from people who chose to cite you, not directories
  • Content that says something specific, not "the ultimate guide to [topic]"

Three observations from doing this for a few years:

  1. Your title tag is the single biggest lever. Position 8 with a 0.2% click-through is a title problem, however good the content underneath it is. Fix titles before you write a word of new content.
  2. Internal linking is undervalued and free. We added contextual CTAs from every blog post to the most relevant service page. Took an afternoon. The blog cluster now feeds traffic into services rather than dead-ending at "related posts".
  3. Most "AEO checklists" are SEO checklists with the word "answer" added. If you skip the SEO basics, no amount of llms.txt will save you.

The Next.js implementation of the boring half

Three patterns cover most of it on an App Router site.

Metadata with fallbacks, not requirements. Every page derives its meta from the data layer, with optional overrides that fall back:

export async function generateMetadata({ params }) {
  const post = await getPost(params.slug);
  return {
    title: post.seoTitle ?? post.title,
    description: post.seoDescription ?? post.description,
    alternates: { canonical: `https://robotostudio.com/blog/${post.slug}` },
  };
}

JSON-LD derived at render time, never hand-authored. The most common mistake we see is structured data as a separate editing surface. It drifts within a sprint. Generate it from fields you already have:

const jsonLd = {
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  headline: post.title,
  datePublished: post.publishedAt,
  dateModified: post.updatedAt,
  author: { "@type": "Person", name: post.author.name, url: post.author.url },
};

One helper function, rendered in a <script type="application/ld+json"> tag. Answer engines weight structured data heavily when picking citation sources. Show both dates visibly on the page too: between two posts answering the same question, the one updated last month usually wins the citation.

A sitemap where lastmod is real. Google has said publicly that it largely ignores priority and changefreq. The freshness signal is lastModified, so generate it from the same data that powers the pages:

// app/sitemap.ts
export default async function sitemap() {
  const posts = await getAllPosts();
  return posts.map((p) => ({
    url: `https://robotostudio.com/blog/${p.slug}`,
    lastModified: p.updatedAt,
  }));
}

Every JSON-LD type worth shipping (and a prompt that builds them)

The BlogPosting example above is one of seven. Here's the full set we wire up, where each one lives, and what feeds it:

SchemaWhereFeeds on
OrganizationRoot layout, onceCompany name, logo, social profiles
WebSiteRoot layout, onceSite name and URL
PersonAuthor bylines and author pagesName, role, photo, socials
BlogPostingEvery postTitle, description, both dates, author, image
BreadcrumbListEvery nested pageDerived from the URL structure
FAQPagePages with real FAQsYour existing FAQ content
Service or ProductOffering pagesName, description, provider

The one people skip is the most important: Organization.sameAs with your social links is how engines connect your X, LinkedIn and GitHub presence to your domain. That entity graph is what gets a brand cited by name instead of as "one source".

Don't fill any of this in by hand. Paste this into Claude Code, Cursor, or whatever agent lives in your repo, and answer its questions:

You're going to wire complete JSON-LD into this codebase. Interview me
first, one question at a time, then implement.

Ask me, in order:
1. Company: legal name, brand name, site URL, logo URL.
2. Socials: links for X, LinkedIn, GitHub, YouTube, Facebook, Instagram
   (skip any we don't have). These become Organization.sameAs.
3. Contact: support email or contact page URL.
4. Authors: for each content author, their name, job title, photo URL,
   personal site, and social profiles. These become Person schemas.
5. Content types: check the repo first, then confirm with me which exist
   (blog, services, case studies, FAQs, products).

Then implement, deriving everything from data that already exists in the
codebase (frontmatter, CMS fields, site config) rather than hardcoding:
- Organization + WebSite once in the root layout
- Person on author bylines, referenced from BlogPosting.author
- BlogPosting on posts, datePublished/dateModified from real fields
- BreadcrumbList derived from URL structure on nested pages
- FAQPage only where actual FAQ content exists
- Service or Product on offering pages

Rules: one shared helper module, rendered as
<script type="application/ld+json">. No editor-facing JSON fields. No
invented data: if I didn't give it to you and the repo doesn't have it,
leave the property out. When done, list every URL + schema pair so I can
spot-check them in Google's Rich Results Test.

Ten minutes of answering questions and the whole entity graph is done, derived from your real data, with nothing for editors to maintain. Spot-check the output in the Rich Results Test before you ship.

Building on Sanity?
If you want the CMS-flavoured implementation of everything in this post, our Sanity AEO/SEO playbook covers JSON-LD, sitemaps, content negotiation, llms.txt, and the accessibility patterns that feed both traditional SEO and AEO in one place.
Read the Sanity AEO/SEO guide

The new half (content negotiation)

Now the actually new bit. Here's what each piece does on robotostudio.com.

The Accept header tells you who's asking

The HTTP Accept header is how a client tells the server which formats it can handle. Browsers send text/html. Agents that want markdown send text/markdown. Claude Code happens to send text/plain. Same URL, different format, no client-side knowledge required.

In next.config.ts:

const acceptMarkdown = {
  type: "header" as const,
  key: "accept",
  value:
    "(?=.*(?:text/plain|text/markdown))(?!.*text/html.*(?:text/plain|text/markdown)).*",
};

const beforeFiles = [
  { source: "/blog/:slug",      has: [acceptMarkdown], destination: "/api/md/blog/:slug" },
  { source: "/services/:slug",  has: [acceptMarkdown], destination: "/api/md/services/:slug" },
  { source: "/migration/:slug", has: [acceptMarkdown], destination: "/api/md/migration/:slug" },
  { source: "/case-study/:slug",has: [acceptMarkdown], destination: "/api/md/case-study/:slug" },
  { source: "/careers/:slug",   has: [acceptMarkdown], destination: "/api/md/careers/:slug" },
  { source: "/",                has: [acceptMarkdown], destination: "/api/md/pages/home" },
];

The regex looks scary but it's defensive. It matches text/markdown or text/plain only when those tokens appear before text/html in the Accept header. Browsers that happen to list text/markdown;q=0.5 after HTML still get HTML. Agents that ask for markdown first get markdown. Borrowed from vercel-labs/markdown-to-agents.

Markdown pair routes: we no longer recommend them

The first version of this post recommended giving every page a .md twin: /blog/some-post for browsers, /blog/some-post.md for agents. We shipped them on robotostudio.com. Both search engines have since told everyone to stop.

Google's official guidance, Optimizing your website for generative AI features on Google Search, is blunt: "You don't need to create new machine readable files, AI text files, markup, or Markdown to appear in generative AI search." The same guide tells you to reduce duplicate content because it wastes crawler resources. John Mueller went further on Bluesky, calling markdown pages for LLMs "such a stupid idea": why would a crawler want a page no user sees, when LLMs have parsed HTML since the beginning?

Bing's Fabrice Canel made the crawl budget case in one line: "really want to double crawl load? We'll crawl anyway to check similarity." That's the part that should worry you. Every .md twin is a second URL per page. Bing crawls it anyway to compare it against the canonical, so a 1,000-page site just became a 2,000-URL crawl for zero ranking benefit. X-Robots-Tag: noindex keeps the twins out of the index (we shipped that from day one) but it doesn't stop the crawler visiting them.

The good news: pair routes were always the redundant half of the pattern. Content negotiation on the canonical URL does the same job without minting new URLs. Googlebot asks for text/html and gets HTML; an agent asking for text/markdown gets markdown; one URL, no duplicates, no extra crawl. If you've already shipped .md twins, stop advertising them (sitemaps, Link headers, llms.txt) and let the Accept header carry the load.

The homepage advertises the agent-facing surfaces in an RFC 8288 Link header:

Link: </llms.txt>; rel="describedby"; type="text/plain",
      </llms-full.txt>; rel="service-doc"; type="text/plain",
      </sitemap.xml>; rel="sitemap"; type="application/xml"

An agent crawling the homepage gets a structured map of where everything lives without parsing HTML.

llms.txt is just a sitemap with intent

/llms.txt is the convention proposed by Jeremy Howard. A markdown file at the root that tells agents what's on the site, in priority order, with both HTML and markdown URLs:

# Roboto Studio

> Our mission is to create the best editorial experiences on the web...

## For AI agents

Send `Accept: text/markdown` against any URL on this site to get
clean markdown at the same address.

## Services
- [Sanity CMS](https://robotostudio.com/services/sanity)

## Blog Posts (Recent)
- [Latest post](https://robotostudio.com/blog/...)

Think of it as a sitemap that talks. We also publish llms-full.txt, which is the entire content of every page concatenated. Useful for agents that want to embed the whole site once instead of re-fetching.

One caveat for balance: Google's generative AI guide lists llms.txt among the files you don't need, and that's accurate for Google's own AI features. The agents that read it are the non-Google ones: Anthropic's fetcher, Perplexity's crawler, and most coding agents. Ship it for them, not for AI Overviews.

The markdown actually has to be clean

This is the part that takes the actual time. If your content is MDX (markdown with React components inside), you can't just serve the file. Agents will choke on <InlineCTA> and <Newsletter> tags.

Our markdown-route.ts walks the MDX:

  • Strips JSX components, keeping their text children
  • Preserves fenced code blocks intact (delimiter-length-aware, so a 3-backtick example inside a 4-backtick fence survives)
  • Preserves inline code spans (CommonMark equal-delimiter rule)
  • Removes import/export statements
  • Renders the frontmatter as a readable header (title, description, author, date)

Then it serves with:

Content-Type: text/markdown; charset=utf-8
Cache-Control: public, max-age=3600, s-maxage=86400
X-Robots-Tag: noindex, nofollow

The X-Robots-Tag matters. The /api/md/* destinations are real URLs anyone can fetch directly, and without the header Google can index them as duplicate copies competing with the canonical HTML pages.

Need this implemented on your site?
We build agent-friendly Next.js sites with content negotiation, llms.txt, and clean schema. Same playbook as ours, applied to yours.
See our GEO service

How do I not go broke because of crawlers

Remember that GPTBot spike? Well of course, caching.

Vercel bot observability for robotostudio.com: 1.5M edge requests in 30 days, with per-bot cache hit rates
30 days of bot traffic on robotostudio.com, from the Edge Requests tab in Vercel Observability

That's 30 days of bot traffic on robotostudio.com (well over 10% of our traffic). The reason we don't go broke is the column on the right. Notice how the meta crawler sits at 83% cached, bingbot at 82%, googlebot at 74%, chatgpt-user at 70%. That's due to aggressive caching.

Bots re-request the same URLs constantly, so when those hits land on the CDN instead of your rendering pipeline, a 10k spike costs roughly nothing.

The mechanics are boring on purpose. If a route can be static, make it static: a bot hitting a prerendered page is a CDN hit, not a function invocation. Content pages should be SSG or ISR, and the markdown route gets real Cache-Control headers (you've already seen ours ship s-maxage=86400 above). Reserve dynamic rendering for pages that genuinely need it, because that's the traffic crawler spikes can actually hurt.

The table lives in the Edge Requests tab of Vercel Observability, which breaks traffic down by individual bot and bot category, AI crawlers included. Or skip the clicking: open your project in the Vercel dashboard and throw /observability/edge-requests?period=30d&tab=botName on the end of the URL, and this exact page pops up. Look at it before assuming agent traffic is what's burning your budget: in our table the SEO audit tooling crawls harder than ChatGPT does.

What to skip

A non-exhaustive list of things being sold as AEO (or GEO) that you can ignore:

  • .md twin URLs for every page. Google and Bing have both advised against separate markdown pages (see above). Content negotiation on the canonical URL covers the same agents without doubling your crawl footprint.
  • "AEO schema." There's no separate schema for AEO. Schema.org is the same standard SEO uses. Add FAQPage if you have FAQs, HowTo for genuine how-tos, Article for posts. That's it.
  • AEO courses. The playbook is two paragraphs of HTTP and a sitemap. Don't pay for it.
  • FAQ stuffing. Writing fake FAQs for the sake of FAQ schema is the new keyword stuffing. Google restricted FAQ rich results to authoritative government and health sites in 2023, so the rich-result payoff is gone for most sites anyway. Write FAQs because they're useful, not because they get cited.
  • Worrying about which agent's leaderboard you're on. ChatGPT, Claude, Perplexity, and Gemini all use slightly different retrieval strategies. The signal that matters across all of them is the same one Google's used for fifteen years: do real people cite you?

Did it work?

It's been a few weeks. Two observations.

Crawler traffic is huge, referral traffic is small. The bots showed up immediately: 16K chatgpt-user requests in 30 days, the GPTBot spike from earlier, the whole observability table. Humans clicking through from AI answers are a different story, around 14 visits per month from Claude, Perplexity, and ChatGPT combined in our PostHog data. The sequence runs crawling first, citations second, clicks last, and we're at stage one. Doing this work now means you're ready for the slope when it arrives, and you get crawled cheaply in the meantime (see the caching section).

The work itself was small. A few rewrites in next.config.ts, a route handler, a markdown converter, and an llms.txt generator. If your site is on Next.js 16 it's an afternoon's work. If it's on something else, the same pattern works on any framework that supports header-conditional routing.

That's the whole playbook from someone who's actually done it, whichever acronym you file it under. Skip the courses.

Frequently asked questions

What's the difference between AEO and GEO?
Nothing practical. AEO (answer engine optimization) came out of the SEO industry; GEO (generative engine optimization) came out of a 2023 academic paper. Both describe getting your content cited by AI engines like ChatGPT, Claude, and Perplexity. The tactics are identical: SEO fundamentals plus serving agents clean markdown.
Is AEO different from SEO?
Mostly no. The retrieval signals AI engines use, real backlinks, clear titles, structured content, accurate meta, are the same signals Google has used for years. The genuinely new part is serving agents a cleaner version of your content on request, called content negotiation. Everything else is SEO with a new logo.
How do I implement AEO/GEO on a Next.js site?
Four pieces. Metadata with fallbacks via generateMetadata, JSON-LD derived from your data at render time, a sitemap with real lastmod dates, and content negotiation: a rewrite in next.config.ts that serves markdown when an agent sends Accept: text/markdown. Add llms.txt for discovery, but skip separate .md URLs per page; Google and Bing advise against duplicate markdown pages.
Do I need an llms.txt file?
It helps. llms.txt is a markdown index at the root of your site that tells agents what's worth crawling, in priority order. By itself it won't move rankings. It makes citation easier, and a growing number of agents look for it. Cost to produce: a route handler. Recommended.
Should I create .md versions of my pages for AI crawlers?
No. Google's generative AI guidance says you don't need markdown versions of pages, and Bing's Fabrice Canel warned they double crawl load because Bing crawls the duplicates anyway to check similarity. Serve markdown through content negotiation on the canonical URL instead: same address, no duplicate URLs, no wasted crawl budget.
Should I block AI crawlers from my site?
Depends on your goal. If you want to be cited in AI answers, you have to let citation-friendly agents crawl. Robots.txt lets you control which agents see what. Most sites trying to grow organic visibility should let the major engines in (Google, Claude, ChatGPT, Perplexity, Gemini) and only block scrapers that don't attribute.

About the Authors

Jono Alford

Founder of Roboto Studio, specializing in headless CMS implementations with Sanity and Next.js. Passionate about building exceptional editorial experiences and helping teams ship faster.

Sne Tripathi
Sne Tripathi

Account Executive

Account Executive at Roboto Studio, bridging the gap between client needs and technical solutions. Ensures every project delivers real business value.

Hrithik Prasad
Hrithik Prasad

Senior Full-stack Developer

Senior Full-stack Developer with expertise in React, Next.js, and Sanity CMS. Loves building performant web applications and sharing knowledge through technical content.

Tope Akintola
Tope Akintola

Frontend Developer

Frontend Developer with a sharp eye for interaction design and component architecture. Brings ideas to life in the browser with a focus on speed, polish, and maintainability.

Get in touch

Tell us what you're building. We reply within one working day — Jono or someone on the team picks up every message personally.