Skip to content



Raw SKILL.md · MIT · sha256:8a114b9cda77d3931f48ea3c2900c85f2451a9a2ecf922c5f694f278d2e6e7e0

CMS to MDX migration

Migrate content out of a headless CMS into MDX files on disk, with frontmatter for structured fields and components for everything richer than Markdown. These patterns come from migrating production sites (50+ blog posts, case studies, service pages, careers, marketing pages) and encode the failure modes so you skip them.

Architecture

One migration script per content type, all sharing the same skeleton:

1. Config: CMS credentials, content dir, progress file path
2. Types: the CMS document shape and a MigrationProgress interface
3. Progress tracking: load/save a JSON progress file
4. Rich text conversion: CMS rich text to Markdown (shared across types)
5. Type-specific processing: slug to filename, frontmatter builder
6. File operations: write file, check existence
7. Main: fetch, loop, process, save progress after every item

Target layout is one directory per content type, one file per document:

content/
├── blog/*.mdx
├── case-studies/*.mdx
├── services/*.mdx
└── pages/*.mdx

Each file's frontmatter is validated by a schema (Zod or similar) at build time. A draft: true/false flag in frontmatter replaces the CMS publish state; the route's static params include drafts only in development.

Core principles

Use the CMS API client directly, never an MCP or chat-sized interface

Batch queries return more data than conversational tooling can handle. Write a script against the CMS SDK (@sanity/client, Contentful's contentful package, the WordPress REST API) and run it with tsx.

Make every run resumable

Maintain a progress JSON and save it after each item:

interface MigrationProgress {
  total: number;
  processed: number;
  successful: number;
  failed: number;
  skipped: number;
  errors: Array<{ id: string; slug: string; error: string }>;
}

Skip documents whose target file already exists. You will re-run the script many times; idempotency is what makes that safe.

Scale batching to the collection

For 50+ items, batch (size 10) with a short delay between batches to respect rate limits. For fewer than 10 items, skip batching entirely and fetch everything in one query.

Keep CMS asset URLs initially

Do not migrate images to new storage as part of the content migration. CMS CDN URLs keep working after the content moves; image re-hosting is a separate, later pass. Coupling them doubles the failure surface of both.

References to other CMS documents cannot be resolved to URLs until everything has migrated. Emit a placeholder and post-process:

result = `[${text}](internal:${ref.documentId})`;
// second pass after all types are migrated: internal:abc123 -> /blog/some-post

Normalize inconsistent field values

Years of editing leave fields with inconsistent casing and formats. Write a normalizer per messy field instead of trusting the data:

function normalizeJobType(type: string | undefined): string {
  if (!type) return "Full-time";
  const lower = type.toLowerCase().replace(/[^a-z]/g, "");
  if (lower.includes("part")) return "Part-time";
  if (lower.includes("contract")) return "Contract";
  return "Full-time";
}

Escape frontmatter strings properly

const escaped = value.replace(/"/g, '\\"').replace(/\n/g, " ");
lines.push(`${key}: "${escaped}"`);

Expect schema churn

Your frontmatter schema will change during migration: fields go optional when the CMS data is sparse, defaults appear for missing values, new optional fields surface in old documents. Treat schema edits as part of the migration, never as scope creep.

Rich text to Markdown gotchas

These are the conversion bugs that actually shipped, and the fixes:

MDX comments. HTML comments (<!-- -->) are a parse error in MDX. Emit JSX comments ({/* */}) instead. Use them to mark unknown block types for later review rather than dropping content silently.

Whitespace-only spans. A span containing only a space but carrying a bold mark must still emit its space, otherwise "working on" becomes "workingon":

if (!text.trim() && text) return text;

Empty marks. Bold or italic wrapping nothing produces ** ** and * *. Clean with .replace(/\*\*\s+\*\*/g, " ") and .replace(/\*\s+\*/g, " ").

Adjacent JSX tags. Two inline components back to back (</Highlight><Highlight>) confuse the MDX parser. Insert a space between them in a cleanup pass.

Unknown block types. Emit {/* Unknown block type: foo */} and keep going. Review the comments at the end rather than aborting mid-collection.

Component mapping

Pages built with a CMS page builder become MDX files that are a flat sequence of components.

Match the CMS block's props exactly. If the CMS block has { title: string }, the MDX component takes a title prop with that name. Do not "improve" the API to children mid-migration; every renamed prop is a class of silent rendering failure across every migrated file.

No wrapper containers. Page-builder content is edge to edge by design. The MDX body is component, component, component, with each component owning its internal spacing. Adding a layout <div> around migrated blocks is the single most common way to break the design.

Automate repeated chrome. If the design wants a divider between every section, write a wrapper component that inserts it between children instead of hand-placing dividers in every file.

Verification

  • Build the site with the full migrated content set; the frontmatter schema catches structural misses.
  • Add a reference validation step that fails the build when an MDX file links to a slug that does not exist (the post-processed internal links make this checkable).
  • Diff rendered page text against the live CMS page for a sample of each content type. Headings and FAQ-style content are where conversion loss hides.

About this skill

Maintained by Roboto Studio, a UK agency specialising in headless CMS builds and migrations. It distills our own production migration from Sanity to MDX on disk. If you would rather have it done for you: robotostudio.com/services/cms-migration.

Licensed MIT. Wow, I can't believe people are actually using these. Tell me if it worked: yo@robotostudio.com