---
name: cms-to-mdx-migration
description: Migrate content from a headless CMS (Sanity, Contentful, WordPress) to MDX files on disk. Covers migration script architecture, resumable progress tracking, rich text to Markdown conversion gotchas, internal link resolution, frontmatter escaping, and component mapping. Use when asked to migrate off a CMS, convert structured content to MDX, or move a site to content-on-disk.
license: MIT
metadata:
  author: roboto-studio
  version: "1.0.0"
  updated: "2026-06-11"
  homepage: https://robotostudio.com/skills/cms-to-mdx-migration
---

# CMS to MDX migration

Migrate content out of a headless CMS into MDX files on disk, with frontmatter for structured fields and components for everything richer than Markdown. These patterns come from migrating production sites (50+ blog posts, case studies, service pages, careers, marketing pages) and encode the failure modes so you skip them.

## Architecture

One migration script per content type, all sharing the same skeleton:

```text
1. Config: CMS credentials, content dir, progress file path
2. Types: the CMS document shape and a MigrationProgress interface
3. Progress tracking: load/save a JSON progress file
4. Rich text conversion: CMS rich text to Markdown (shared across types)
5. Type-specific processing: slug to filename, frontmatter builder
6. File operations: write file, check existence
7. Main: fetch, loop, process, save progress after every item
```

Target layout is one directory per content type, one file per document:

```text
content/
├── blog/*.mdx
├── case-studies/*.mdx
├── services/*.mdx
└── pages/*.mdx
```

Each file's frontmatter is validated by a schema (Zod or similar) at build time. A `draft: true/false` flag in frontmatter replaces the CMS publish state; the route's static params include drafts only in development.

## Core principles

### Use the CMS API client directly, never an MCP or chat-sized interface

Batch queries return more data than conversational tooling can handle. Write a script against the CMS SDK (`@sanity/client`, Contentful's `contentful` package, the WordPress REST API) and run it with `tsx`.

### Make every run resumable

Maintain a progress JSON and save it after each item:

```typescript
interface MigrationProgress {
  total: number;
  processed: number;
  successful: number;
  failed: number;
  skipped: number;
  errors: Array<{ id: string; slug: string; error: string }>;
}
```

Skip documents whose target file already exists. You will re-run the script many times; idempotency is what makes that safe.

### Scale batching to the collection

For 50+ items, batch (size 10) with a short delay between batches to respect rate limits. For fewer than 10 items, skip batching entirely and fetch everything in one query.

### Keep CMS asset URLs initially

Do not migrate images to new storage as part of the content migration. CMS CDN URLs keep working after the content moves; image re-hosting is a separate, later pass. Coupling them doubles the failure surface of both.

### Store internal links as placeholders, resolve later

References to other CMS documents cannot be resolved to URLs until everything has migrated. Emit a placeholder and post-process:

```typescript
result = `[${text}](internal:${ref.documentId})`;
// second pass after all types are migrated: internal:abc123 -> /blog/some-post
```

### Normalize inconsistent field values

Years of editing leave fields with inconsistent casing and formats. Write a normalizer per messy field instead of trusting the data:

```typescript
function normalizeJobType(type: string | undefined): string {
  if (!type) return "Full-time";
  const lower = type.toLowerCase().replace(/[^a-z]/g, "");
  if (lower.includes("part")) return "Part-time";
  if (lower.includes("contract")) return "Contract";
  return "Full-time";
}
```

### Escape frontmatter strings properly

```typescript
const escaped = value.replace(/"/g, '\\"').replace(/\n/g, " ");
lines.push(`${key}: "${escaped}"`);
```

### Expect schema churn

Your frontmatter schema will change during migration: fields go optional when the CMS data is sparse, defaults appear for missing values, new optional fields surface in old documents. Treat schema edits as part of the migration, never as scope creep.

## Rich text to Markdown gotchas

These are the conversion bugs that actually shipped, and the fixes:

**MDX comments.** HTML comments (`<!-- -->`) are a parse error in MDX. Emit JSX comments (`{/* */}`) instead. Use them to mark unknown block types for later review rather than dropping content silently.

**Whitespace-only spans.** A span containing only a space but carrying a bold mark must still emit its space, otherwise "working on" becomes "workingon":

```typescript
if (!text.trim() && text) return text;
```

**Empty marks.** Bold or italic wrapping nothing produces `** **` and `* *`. Clean with `.replace(/\*\*\s+\*\*/g, " ")` and `.replace(/\*\s+\*/g, " ")`.

**Adjacent JSX tags.** Two inline components back to back (`</Highlight><Highlight>`) confuse the MDX parser. Insert a space between them in a cleanup pass.

**Unknown block types.** Emit `{/* Unknown block type: foo */}` and keep going. Review the comments at the end rather than aborting mid-collection.

## Component mapping

Pages built with a CMS page builder become MDX files that are a flat sequence of components.

**Match the CMS block's props exactly.** If the CMS block has `{ title: string }`, the MDX component takes a `title` prop with that name. Do not "improve" the API to `children` mid-migration; every renamed prop is a class of silent rendering failure across every migrated file.

**No wrapper containers.** Page-builder content is edge to edge by design. The MDX body is component, component, component, with each component owning its internal spacing. Adding a layout `<div>` around migrated blocks is the single most common way to break the design.

**Automate repeated chrome.** If the design wants a divider between every section, write a wrapper component that inserts it between children instead of hand-placing dividers in every file.

## Verification

- Build the site with the full migrated content set; the frontmatter schema catches structural misses.
- Add a reference validation step that fails the build when an MDX file links to a slug that does not exist (the post-processed internal links make this checkable).
- Diff rendered page text against the live CMS page for a sample of each content type. Headings and FAQ-style content are where conversion loss hides.

## About this skill

Maintained by [Roboto Studio](https://robotostudio.com), a UK agency specialising in headless CMS builds and migrations. It distills our own production migration from Sanity to MDX on disk. If you would rather have it done for you: [robotostudio.com/services/cms-migration](https://robotostudio.com/services/cms-migration).

Licensed MIT. Wow, I can't believe people are actually using these. Tell me if it worked: yo@robotostudio.com
