CMS to MDX migration
Migrate content out of a headless CMS into MDX files on disk, with frontmatter for structured fields and components for everything richer than Markdown. These patterns come from migrating production sites (50+ blog posts, case studies, service pages, careers, marketing pages) and encode the failure modes so you skip them.
Architecture
One migration script per content type, all sharing the same skeleton:
Target layout is one directory per content type, one file per document:
Each file's frontmatter is validated by a schema (Zod or similar) at build time. A draft: true/false flag in frontmatter replaces the CMS publish state; the route's static params include drafts only in development.
Core principles
Use the CMS API client directly, never an MCP or chat-sized interface
Batch queries return more data than conversational tooling can handle. Write a script against the CMS SDK (@sanity/client, Contentful's contentful package, the WordPress REST API) and run it with tsx.
Make every run resumable
Maintain a progress JSON and save it after each item:
Skip documents whose target file already exists. You will re-run the script many times; idempotency is what makes that safe.
Scale batching to the collection
For 50+ items, batch (size 10) with a short delay between batches to respect rate limits. For fewer than 10 items, skip batching entirely and fetch everything in one query.
Keep CMS asset URLs initially
Do not migrate images to new storage as part of the content migration. CMS CDN URLs keep working after the content moves; image re-hosting is a separate, later pass. Coupling them doubles the failure surface of both.
Store internal links as placeholders, resolve later
References to other CMS documents cannot be resolved to URLs until everything has migrated. Emit a placeholder and post-process:
Normalize inconsistent field values
Years of editing leave fields with inconsistent casing and formats. Write a normalizer per messy field instead of trusting the data:
Escape frontmatter strings properly
Expect schema churn
Your frontmatter schema will change during migration: fields go optional when the CMS data is sparse, defaults appear for missing values, new optional fields surface in old documents. Treat schema edits as part of the migration, never as scope creep.
Rich text to Markdown gotchas
These are the conversion bugs that actually shipped, and the fixes:
MDX comments. HTML comments (<!-- -->) are a parse error in MDX. Emit JSX comments ({/* */}) instead. Use them to mark unknown block types for later review rather than dropping content silently.
Whitespace-only spans. A span containing only a space but carrying a bold mark must still emit its space, otherwise "working on" becomes "workingon":
Empty marks. Bold or italic wrapping nothing produces ** ** and * *. Clean with .replace(/\*\*\s+\*\*/g, " ") and .replace(/\*\s+\*/g, " ").
Adjacent JSX tags. Two inline components back to back (</Highlight><Highlight>) confuse the MDX parser. Insert a space between them in a cleanup pass.
Unknown block types. Emit {/* Unknown block type: foo */} and keep going. Review the comments at the end rather than aborting mid-collection.
Component mapping
Pages built with a CMS page builder become MDX files that are a flat sequence of components.
Match the CMS block's props exactly. If the CMS block has { title: string }, the MDX component takes a title prop with that name. Do not "improve" the API to children mid-migration; every renamed prop is a class of silent rendering failure across every migrated file.
No wrapper containers. Page-builder content is edge to edge by design. The MDX body is component, component, component, with each component owning its internal spacing. Adding a layout <div> around migrated blocks is the single most common way to break the design.
Automate repeated chrome. If the design wants a divider between every section, write a wrapper component that inserts it between children instead of hand-placing dividers in every file.
Verification
- Build the site with the full migrated content set; the frontmatter schema catches structural misses.
- Add a reference validation step that fails the build when an MDX file links to a slug that does not exist (the post-processed internal links make this checkable).
- Diff rendered page text against the live CMS page for a sample of each content type. Headings and FAQ-style content are where conversion loss hides.
About this skill
Maintained by Roboto Studio, a UK agency specialising in headless CMS builds and migrations. It distills our own production migration from Sanity to MDX on disk. If you would rather have it done for you: robotostudio.com/services/cms-migration.
Licensed MIT. Wow, I can't believe people are actually using these. Tell me if it worked: yo@robotostudio.com