How we migrate content without breaking stuff

How we migrate content without breaking stuff

Ever wondered how Roboto Studio lifts and shifts hundreds of pages for website migration? Let's guide you through our process and share some tips along the way

HrithikHrithikSenior Full-stack Developer

Content migration isn't just about moving files from one location to another. Whether you're dealing with SharePoint content migration, moving from WordPress to a headless CMS, or executing a complete website migration, it's about preserving stories, maintaining relationships, and ensuring that years of valuable content continues to serve its purpose in a modern, scalable environment.

Here's how we've refined our approach to make these transitions seamless—and more importantlywithout breaking stuff.

The challenge: Legacy content, modern expectations

We've taken on a lot of migration projects over the years, and to be frank, they're never easy; there's always nuance. So, when an institution as influential as the George W. Bush Presidential Center approached us, we knew we had to get it right the first time.

Their legacy website contained years of articles, author profiles, and interconnected content that needed to be preserved and made live on their new Sanity-powered platform, while being shifted from their legacy WordPress one.

The stakes were high. This wasn't just content, it was digital history that needed to be preserved, enhanced, and made more discoverable than ever before. Like many website migrations we've handled, success depended on having a solid migration strategy from day one.

Our comprehensive content migration plan

Step 1: Smart content extraction

We begin by building custom scrapers using Node.js, leveraging libraries such as Axios and Cheerio. Depending on how the website is set up, we use an appropriate tool. For example, we use Cheerio if the website is predominantly SSR (server-side rendered), as it excels at scraping HTML. However, if it is more client-side focused, it is a poor choice of tool.

This automated approach eliminates the tedious (and error-prone) process of manual copying and pasting. More importantly, it ensures that we capture everything with perfect consistency—a critical component of any website content migration checklist.

Step 2: Creating a single source of truth

Once we've gathered all the raw materials, we compile everything into what we call our "SSOT" (Single Source of Truth). While this could be a simple CSV file, we typically structure it as a TypeScript file instead. Why? Because it gives us version control, static typing, and a rock-solid foundation for everything that follows.

This SSOT becomes our go-to file for passing updates along—a stable, verifiable dataset that we can reference throughout the entire website content migration process. It's an essential part of our migration strategy that prevents data loss and ensures accountability.

Step 3: AI-powered content enhancement

Here's where things get exciting. Raw scraped data is often messy—inconsistent HTML, missing metadata, you name it. We leverage AI to clean, enhance, and structure the content intelligently.

Our AI pipeline can programmatically summarize articles, generate SEO-friendly meta descriptions, suggest relevant tags and categories, and even identify key entities like people and organizations mentioned in the text. Because AI is so heavily focused around predicting the next token, it's a perfect fit for content migration.

The result? We save hundreds of hours of manual work while actually improving the quality and discoverability of the content—beats manually typing thousands of lines of code... I wonder if we've done this before

Step 4: The migration engine

With clean, enhanced data in hand, we fire up our core migration script—another Node.js application that connects directly to the Sanity API. This is where the magic happens:

Data mapping: We meticulously map fields from our master list to the corresponding fields in Sanity's content models. Every piece of data finds its perfect home, following the same principles we use for Sanity, Contentful or hell, even Sharepoint if you really have to.

Asset handling: The script downloads all images from the old site, uploads them to Sanity's high-performance asset CDN, and correctly links them within the new content. No broken images, no missing files, you can sleep at night.

Relationship building: Perhaps most importantly, the script creates relationships between documents. Articles get linked to their authors, categories connect to related content, and we build a rich, interconnected content graph that makes everything more discoverable. This is probably the hardest but most important part of the migration.

SEO preservation: We automatically create redirects from old URLs to their new homes, preserving SEO rankings and ensuring a seamless user experience. This is often the most critical aspect of website migrations from an SEO perspective.
The entire process is transactional—we bundle all changes for a single article into one operation. It's all or nothing, which prevents partial or corrupted data from being entered into the new system.

Step 5: Human-powered quality assurance

No automated process is complete without the human touch. After the migration script finishes, our team conducts thorough QA. We review the migrated content in Sanity Studio and on the live website, checking for formatting issues, broken links, data inconsistencies, and overall content integrity.

This final step is our commitment to ensuring a seamless transition, guaranteeing that the content audiences see is accurate, meta-rich, and functional. It's the same rigorous approach we apply to all our website migrations, regardless of the source platform.

Beyond migration: building for the future

Our content migration process isn't just a one-time event. It's a living playbook that we continuously refine and adapt for future needs. Whether we're handling a complex Enterprise content migration or a straightforward CMS migration, we follow the same core principles: preserve what matters, enhance where possible, and build for the future.

The Presidential Leadership Scholars migration was particularly rewarding because we weren't just moving content—we were giving it new life in a modern, scalable environment where it can continue to inspire and educate.

What we've learned about website content migration

Every migration teaches us something new. Sometimes it's a technical insight about handling edge cases in legacy HTML. Other times, it's a process improvement that saves hours on future projects. We've learned that the best migrations strike a balance between automation and human insight, and that taking the time to enhance content during the migration process pays dividends in the long run.

Our migration strategy has evolved to include comprehensive planning phases, detailed content audits, and robust testing procedures. These elements are now standard parts of our website migration checklist, ensuring consistent results across all projects.

Ready to make a move?

If you're sitting on valuable content that's trapped in an aging system, we'd love to help you set it free. Whether you need to migrate from a janky platform that stopped being supported around the .com boom or a complete overhaul of your current website, we take care of it.

Got questions about your own content migration challenges? We're always excited to talk through complex technical problems. Or if you know exactly what you want, you can use our migration helper to get you to hit the ground running.


Get in touch

Book a meeting with us to discuss how we can help or fill out a form to get in touch