Case study
View case studySlingshot Bio
Roboto converged Slingshot Bio's WordPress and Shopify sites into one headless Shopify build on Next.js and Sanity, instrumented end to end and AI-ready.

The work your team keeps doing by hand? An AI agent can take it. We build custom AI automation and agentic workflows on Vercel that you're not afraid to push to prod.
Forward deployed engineers for AI
You've already prototyped the AI feature. We rebuild it to run in production: durable workflows on Vercel that survive deployments, retry failed steps, and keep running under real traffic. The same pattern that briefs our sales team within a minute of a form submit, fills a content calendar overnight, and keeps a product catalogue current without anyone opening a spreadsheet.
My best experience with a consulting company. The results were delivered faster than expected and with top quality. Jono ensured I understood the process and suggested a great approach. Both execution and communication were flawless.
CEO at Topaz Labs
Real systems, not demos
Every workflow below runs in production. They survive server restarts, retry failed steps automatically, and pause for external events without consuming compute. Here's what that looks like for real problems.
Your marketing team knows they should publish more. They don't have the hours. Here's what we build: a workflow that connects to the Ahrefs API, pulls your keyword gaps and ranking opportunities, then generates research briefs for each topic. A second workflow takes those briefs, researches the subject using AI, writes a first draft, and pushes it to your CMS as a draft post.
Set the whole thing on a CRON schedule. Monday morning, your editor opens Sanity and finds five draft posts waiting for review, each targeted at a keyword your competitors rank for and you don't. The AI did the research and the first draft. Your writer does the thinking and the polish.
Each step in the pipeline retries independently. If the Ahrefs API rate-limits you, that step waits and retries. If the LLM call fails, it tries again without re-fetching the keyword data. Deploy a code update while a draft is mid-generation? The workflow finishes on the old version.
If you operate in multiple markets, the same pipeline can generate localised versions of each post. The workflow takes your approved English draft, translates it, adjusts examples and references for the target region, and pushes each version to the correct locale in your CMS. One editorial review produces content for every market you sell into.
When someone submits a contact form on our site, a workflow kicks off within seconds. It extracts the domain from their email, scrapes their company's website for context, then sends everything to Claude. The AI researches the company, looks at what they do, checks for recent news or funding rounds, and generates a structured brief. That brief lands in our Slack within a minute of the form submission.
The entire thing is about 40 lines of TypeScript. Each step uses a "use step" directive, so if Claude's API is slow or Slack returns a 500, that individual step retries without re-scraping the website. We use this ourselves, every day.
For clients, we extend this pattern to push enrichment data into their CRM, score leads based on company fit, and trigger different follow-up sequences depending on what the AI finds. A SaaS company can automatically route enterprise leads to their sales team and self-serve leads to a product tour.
Workflows are great for request-response pipelines. Some problems need an agent that lives in the background, watches for changes, decides what to do, and acts. Stale comparison pages that need fact-checking against a competitor's docs. Brand citation alerts in ChatGPT and Perplexity. A llms.txt file that should regenerate every time the product changes.
We build these on eve, Vercel's open-source agent framework. eve treats an agent as ordinary files in a TypeScript repo: a markdown system prompt, typed tool definitions, and defineSchedule for the cron that wakes it up. Sessions are durable by default, built on the same workflow engine as our pipelines, so an agent halfway through a task survives a crash or a deploy. When it needs to run code, it does so inside an isolated Vercel Sandbox, never against your production environment. The same agent runs locally under eve dev and on Vercel in production. Schedules trigger them. Remote webhooks trigger them. Other agents trigger them.
Credentials stay out of the codebase. When an agent posts to Slack, opens a pull request, or writes to your CRM, it asks Vercel Connect for a short-lived, scoped token at runtime instead of reading a long-lived secret from an environment variable. The keys stay centralised and auditable, and you revoke access at the provider rather than by redeploying.
Roboto's own CMS migration pages are kept current by a weekly background agent that scrapes the source CMS's docs, diffs against our YAML, and opens a pull request when something is out of date. The agent we built for ourselves is the same agent we ship to clients. If you want one watching your competitor's pricing, your product changelog, your brand citations across the AI search surfaces, we build it on the same foundation.
An agent without evals is a demo. The moment you connect it to real data, a prompt change can break it silently. We build eval pipelines alongside every production agent: golden datasets of inputs your agent should handle, deterministic checks for the parts you can grade with code, and LLM-as-judge grading for the parts you can't.
Every prompt change runs through the eval suite before it ships. Every model upgrade gets scored before you switch over. Regressions get caught before your editor reviews a bad draft or your sales team gets a wrong brief. The eval suite is the closest thing agentic systems have to a test suite, and it's the difference between "the agent worked last week" and "the agent works".
We treat evals as a productised add-on. They can be the starting point of an engagement if you already have an agent in production that isn't trustworthy, or they can be built alongside a new agent from day one.
eve
Vercel's open-source agent framework, and what our background agents actually run on. An agent is a directory of files in your repo, so it ships through the same branches, pull requests, and preview deploys as the rest of your code. Durable sessions, sandboxing, human approvals, and evals all come built in.
Vercel Workflows
The Workflow Development Kit gives every step automatic retries, durable state, and replay-on-deploy. Open source, runs anywhere, but pairs cleanly with Vercel Fluid Compute.
Vercel AI SDK
Provider-agnostic model calls, structured outputs via Zod, tool use, streaming, and observability. Switch models without rewriting the agent.
Vercel Sandbox
Isolated microVMs for code execution inside an agent. Lets agents run shell commands, clone repos, and execute generated code without giving them production access.
Vercel Connect
Short-lived, scoped credentials for agents. Rather than long-lived secrets sitting in environment variables, an agent requests a token at runtime to act in Slack, GitHub, Salesforce, or any OAuth or API-key service. Access stays scoped per project and revocable at the provider.
PostHog LLM analytics
Every model call, every tool invocation, every conversation captured for review. Quality regressions and cost blowouts get caught the day they happen, not the week after.
Delivery model
Agentic systems live inside your codebase, your content models, and your observability stack. They need daily iteration on prompts, tools, and editorial review.
So we ship them the way Palantir, OpenAI, and Anthropic ship AI: by embedding senior engineers directly into your team for the life of the engagement.
A Roboto FDE engagement puts one or two senior engineers into your codebase with the same access as your own team. Shared Slack channel, repo write access, on-call posture for the agents we ship. Weekly demos, daily-or-better async updates, and a documented playbook your team owns when the engagement winds down. We transfer knowledge as we go, so nothing depends on a rushed handover at the end.
The cadence matters because agentic systems aren't ship-and-walk-away projects. Prompt iteration runs daily once the agent hits real traffic. Tool integrations break in ways nobody predicts at scoping. Editorial review needs context only your team can give. An external vendor on a weekly call is too slow for that loop.
Engagements run a minimum of eight to twelve weeks so the build, observe, and iterate cycle has room to play out. Most settle into a rolling monthly retainer once the first systems are live and the next ones are queued.
CMS migrations, headless Shopify builds, and Contentful implementations all ship cleanly as projects with handoff boundaries. Agentic work doesn't. The output is your voice, your data, and your customer-facing automation, and it changes weekly based on what real usage reveals.
That's why FDE applies to agentic workflows and AEO engagements, not to every service Roboto offers. If you're after a fixed-scope build with a clean handoff, our project teams handle that. If you're after an agent or workflow that needs to keep getting better in production, you want the embedded model.
A Vercel partner agency, shipping production agents and workflows on Vercel's AI infrastructure since the first release of the Workflow Development Kit.
One or two senior Roboto engineers join your Slack, your repo, and your standups for the life of the engagement. We treat your codebase like ours: branches, pull requests, code review, deploys. Weekly demos cover what shipped, what's queued, and what decisions need your input. Daily async updates keep you ahead of the work. Minimum engagement is eight to twelve weeks; most settle into a rolling monthly retainer once the first agents are live.
Most AI automation agencies sell no-code workflows on Zapier, Make, or n8n. Useful for prototypes, fragile in production. We ship typed TypeScript on Vercel's durable workflow runtime, with eval pipelines, observability, and version control. The systems we build live inside your codebase, get reviewed in your pull request flow, and survive your deploys. Different toolchain, different reliability bar.
eve is Vercel's open-source agent framework, and it's what our production agents run on. It treats an agent as a directory of files in a TypeScript repo: a markdown system prompt, typed tools, schedules, and skills. The framework handles the parts every serious agent needs anyway, like durable sessions that survive a deploy, sandboxed code execution, human-in-the-loop approvals, OpenTelemetry tracing, and an eval harness. Because an agent is just code in your repo, it gets versioned, reviewed, and deployed like everything else you ship. We run our own background agents on eve, so the patterns we bring to your build are ones we've already debugged on ourselves.
A program that saves its progress as it runs. If the server crashes, it picks up from the last completed step instead of starting over. Traditional server code loses everything on restart. Durable workflows don't. Think of it like a save point in a game. That makes them the most stable way to run AI systems or anything that needs to wait, for an API response, a human approval, or a scheduled delay, and still finish reliably. For AI orchestration specifically, where you're chaining multiple model calls, tool lookups, and external APIs together, durability turns a fragile chain into one that finishes even when a step fails.
No. eve and the Workflow Development Kit are both open source, and the AI SDK is portable across model providers, so the core runs on AWS, Google Cloud, or your own servers. Vercel gives you zero-config deployment, managed Sandbox for code execution, and Connect for runtime credentials, which is why we default to it. The architecture travels; the managed convenience is what you'd trade away by self-hosting.
If it has an API, we can wire it in. We've built integrations with Ahrefs, PostHog, Slack, Sanity, various CRMs, payment processors, and custom internal tools. Each integration is a step in a workflow, so it gets automatic retries and error handling for free.
Software that acts on its own with human oversight. An agentic workflow might research a lead, draft a blog post, or classify a support ticket without anyone clicking a button. A human reviews the output before it goes live. The agent handles the grunt work, people handle the judgment.
Every workflow we build has a human review step where it matters. AI drafts the blog post, your editor approves it. AI enriches the lead, your sales rep reads the brief. AI classifies the ticket, your support team sees the suggestion. We don't ship workflows where AI output goes straight to your customers without a check. We also build eval pipelines for every agent we put into production, so quality degradation gets caught before your team feels it.
Most teams we work with have a working prototype that needs to become production-grade. That usually means adding durability so it doesn't break on deploy, observability so you can debug failures, evals so quality stays measurable, and proper error handling so one bad API response doesn't tank the pipeline. We audit what you have and figure out the fastest path to reliable.
Get started
The content backlog, the unenriched leads, the catalogue nobody can keep current. Pick the one costing your team the most and we'll scope it.
Tell us what you're building. We reply within one working day — Jono or someone on the team picks up every message personally.
Functional, beautiful, websites that you actually want to edit