Skip to content




Content automation without the slop

Content automation without the slop

Content automation earns its bad name when a model makes decisions that should be rules. Here's where we draw the line, built on the Vercel AI SDK.


A bot that writes on a schedule sounds like a slop machine. Set it running, walk away, come back to a hundred posts that read like the same press release. That fear is reasonable. Most automated content is bad.

We run a scheduled agent anyway, and the corrections it ships are more reliable than the ones we used to make by hand. The difference is one rule: the model never makes a decision that should be a rule. Everything in this post is an argument for that line, plus the parts that matter once you start building real tooling: how to enforce the line in code, how to test it, and what to do when it breaks. We build all of this on the Vercel AI SDK, and we'll show you why.

First, the framing most teams get wrong. Content automation is not "let AI write your blog posts." Writing is maybe 30% of the time cost of publishing. The other 70% is research, briefs, formatting, fact-checking, SEO checks, and review. Those steps are mechanical and predictable, and they're where content dies, because nobody wants to spend Tuesday afternoon copying metadata into form fields.

Slop is a determinism problem

Here's a definition worth keeping: slop is what you get when a stochastic model makes a decision that should have been deterministic.

Why a language model is the wrong place for a rule

A language model is a probability machine. Ask it the same question twice and you can get two different answers. That variance is a feature when you want it to phrase a sentence, and a liability when you want it to decide whether a competitor's price changed. Point that randomness at a factual or structural decision and you've built a slop generator. Not because the model is dumb, but because you gave a dice-roll a job that needed a rule.

The trap is that the model is good enough to fool you. It will confidently pick a topic, assert a price, and write 800 words around it. Nothing in the output says "I guessed." So the failure is silent, and it scales. One bad post is an embarrassment. A pipeline of them is a reputation.

The decision-sorting exercise

Before you write a line of agent code, list every decision in the workflow and sort each one into two piles: rule or judgement.

"Which page do we update this week?" is a rule (the stalest one). "Did this vendor change their pricing?" is judgement. "Is the new price formatted correctly?" is a rule. "How do we phrase the correction?" is judgement. Do this honestly and the architecture falls out of it: the rules become code, the judgement calls go to the model, and the model runs inside the rules rather than above them.

Most teams skip this step and hand the whole workflow to one big prompt. That's the original sin of bad AI tooling.

The two lanes of a content pipeline

Every pipeline has two lanes running side by side. One is deterministic and owns control flow. The other is the model, doing the parts that genuinely need reading and writing.

  DETERMINISTIC RAIL                      MODEL (constrained)
  ┌──────────────────┐
  │ schedule fires   │
  │ pick stalest page│ ──── context ───▶  read sources, decide
   (oldest first)   │                    what changed, draft copy
  └──────────────────┘ ◀─── proposal ───
  ┌──────────────────┐
  │ validate edit    │  reject if no source, wrong shape,
  │ open DRAFT pr    │  empty diff, or off-limits file
  │ ping a human     │
  └──────────────────┘

The model sits inside the rail. It receives a tightly scoped task and hands back a structured proposal. It never decides when to run, which page to touch, or whether the result goes live. The rail calls the model the way your code calls any other unreliable dependency: with a timeout, a schema on the response, and a plan for when it returns nonsense.

A worked example: keeping comparison pages honest

Here's a real one. We publish comparison pages that help people pick between content platforms before a migration. Vendors change pricing, rename tiers, and ship features constantly, so comparison pages rot. A page that was accurate in March quietly lies by June.

What the agent does each week

So we run a small agent on a weekly schedule whose entire job is to keep those pages current. Each run it does the same five things, in the same order:

  1. Finds the single most out-of-date comparison, picked by timestamp.
  2. Reads the current page so it knows what we already claim.
  3. Fetches the vendor's pricing page, changelog, and a few community threads.
  4. Proposes specific field changes, each with a cited source quote.
  5. Opens a draft pull request and pings a human in Slack.

Steps 1, 2, and 5 are deterministic plumbing. Steps 3 and 4 are the model's job. Nothing about that order is left to the model's discretion, which is the point.

Where the decisions land

The pipeline decides (rules in code)The model decides (judgement)
When it runs (a timer, not a mood)What a vendor's page says now
Which page: the stalest, picked by timestampWhether a fact genuinely changed
That edits are surgical field patches, never rewritesA confidence level for each change
That every claim carries a source quote or it's droppedWhich sources are worth trusting
That tone is off-limits: facts can change, opinions can'tThe wording of a corrected fact
That a no-op run opens nothing
That it can touch only those files
That a human publishes, never the agent

Every row in the left column is a decision we refused to delegate. The model is strong at "read this pricing page and tell me what the Team plan costs now." It's unreliable at "decide which of forty pages to edit and push it live." So we only ask it the first kind of question.

Building the rails with the Vercel AI SDK

This is where the Vercel AI SDK earns its place. We've tried the heavier agent frameworks, and we keep coming back to the AI SDK for one reason: it makes the deterministic boundary a type, not a hope. If you're building AI tooling and you want one recommendation from this post, it's this stack. Here's how the rails get built.

Make the model answer in a shape

Don't ask the model "did the price change?" and parse prose. Make it fill a schema. Structured output turns a judgement into a shape, and a malformed or hedging answer can't get through, because it won't match the schema.

import { generateText, Output } from "ai";
import { z } from "zod";

const { output } = await generateText({
  model: "anthropic/claude-haiku-4.5",
  output: Output.object({
    schema: z.object({
      field: z.enum(["pricing", "tier-name", "shipped-feature"]),
      changed: z.boolean(),
      proposedValue: z.string().nullable(),
      confidence: z.enum(["high", "medium", "low"]),
      sourceQuote: z.string().min(1), // no quote, no finding
    }),
  }),
  prompt: researchContext,
});

That sourceQuote: z.string().min(1) is doing real work. A finding without a cited snippet can't be represented, so the model can't propose a change it can't back up. The schema, not the prompt, enforces "cite or stay quiet." This is the single most useful habit in building AI tooling: every time you're tempted to write "and make sure you..." in a prompt, ask whether it should be a field in a schema instead.

Give it tools, not file access

The model never touches a file directly. It calls a tool you defined, and the tool's inputSchema validates every argument before your code runs.

import { tool } from "ai";
import { z } from "zod";

const proposeEdit = tool({
  description: "Patch one existing field on a comparison page. Cannot create new fields.",
  inputSchema: z.object({
    slug: z.string().regex(/^[a-z0-9-]+$/), // refuses unsafe targets
    path: z.array(z.string()),              // must point at an existing key
    value: z.string(),
    source: z.url(),                        // every edit names its evidence
  }),
  execute: async (patch) => openDraftPr(patch), // opens a draft, never publishes
});

The model fills in value. The schema decides whether that value is even allowed to exist. Two details matter when you design tools this way. The description is part of the prompt, so write it like an instruction: ours says "cannot create new fields" because the model needs to know the boundary, not just hit it. And keep each tool narrow. A proposeEdit that patches one field is easy to validate and hard to misuse. A doEverything tool is neither.

Wire the tools into a ToolLoopAgent and the SDK runs the read-decide-act loop for you. The model's reach is still bounded by exactly the tools you handed it and nothing else.

import { ToolLoopAgent } from "ai";

const editor = new ToolLoopAgent({
  model: "anthropic/claude-haiku-4.5",
  tools: { fetchSource, proposeEdit },
});

Let the SDK reject bad calls

You don't have to write the validation glue yourself. If the model hallucinates a tool that doesn't exist, the SDK raises NoSuchToolError. If it calls a real tool with arguments that don't match the schema, you get an InvalidToolInputError, and your execute function never runs. The bad call is caught at the door.

That's a free guardrail, and it's a big part of why we don't roll our own agent loop. The schema you wrote for documentation is also the schema that enforces correctness at runtime. You define the boundary once and the SDK polices it on every call.

Keep the irreversible action human

One rule sits outside the SDK and matters more than any of them: every change lands as a draft. The pipeline owns the single irreversible action, publishing, and it never lets the model take it.

We proved this works the hard way recently. We deleted six auto-generated drafts in one go because they were generic and thin. The system did its job: the slop got caught at the draft gate, not on the live site. When you're deciding which decisions to keep human, start with the ones you can't take back.

Prove it works, then prove it keeps working

A pipeline you can't test is a pipeline you can't trust. But you can't unit-test prose, and that confuses people into testing nothing. So they ship an agent, watch it work once, and assume it'll keep working. It won't. Prompts drift, models get swapped, a source changes its HTML, and the thing that worked in May quietly degrades by July.

The trick is to test the behaviour, not the writing. We run a small eval suite that checks the agent reaches for the right tools and stops, not that the sentences are good. One case reads, in plain terms: "given a normal week, the agent should pick the stalest page, read it, and make no more than a couple of tool calls before proposing." If a prompt change makes the agent start thrashing or skip the read step, that eval fails, and we find out before a bad PR does.

Think of evals as a regression net for the rails, not a quality score for the content. They answer "is the machine still behaving the way we designed it," which is a deterministic question with a deterministic answer. The quality of the prose is a separate problem, and that one still belongs to a human reviewer.

When things fail

Real AI tooling spends most of its code on the unhappy path. The model returns garbage, a fetch times out, a vendor page is behind a login. Plan for it.

Failing closed

The default posture is to do nothing rather than guess. If the researcher can't cite a source, the finding is dropped. If every finding gets dropped, the run ends and opens no pull request. An empty diff is a success, not a failure: it means the page is already correct. The worst thing a content pipeline can do is generate output to justify having run, so we made silence the easy path.

Retries and roundtrips

The SDK handles the transient failures so you don't have to hand-roll them. Failed model calls retry automatically, twice by default, configurable with maxRetries. When your execute function throws, the SDK turns the error into a tool-error the model can see on its next step, so a recoverable problem (a 404 on one source) becomes a roundtrip rather than a crash.

Structured output gives you the same safety on the way out. If the model genuinely can't produce something that fits the schema, the call throws instead of handing you a half-formed object. You get a loud failure you can catch, not a quiet one that ships. Across the whole pipeline the rule is the same: surface the error, fail closed, and never let a degraded run masquerade as a good one.

Where we keep the model free

Constraints stop slop, but over-constraint creates a different problem. If you template every sentence, you get content that reads like a form letter, which is its own kind of slop.

So we leave the genuinely hard parts loose. Reading a messy pricing page and working out what genuinely changed. Judging whether a Reddit thread reflects a real pain point or one annoyed user. Phrasing a correction so it stays sharp instead of going limp. That's judgement, and judgement is what the model is for. Forcing it into a rigid template there would make the output worse, not safer.

The skill is knowing which lane each decision belongs in. Pricing accuracy is a rule. The sentence that states the price is judgement. Keep those separate and you get reliability and readability at the same time.

The determinism dial

Deterministic versus non-deterministic isn't a switch, it's a dial, and the AI SDK gives you the knobs to set it per call.

The knobs

  • temperature turns the randomness up or down. Low for extraction, higher for drafting.
  • Output.object() versus free text decides whether an answer must fit a shape or can roam.
  • tool() validation decides what the model is physically able to do.
  • stopWhen decides how long the loop is allowed to run before you pull it out.

You set these per task, not per project. The same agent can extract a price at temperature zero and draft a sentence at temperature 0.7 in the same run.

Cheap model, narrow task

Narrow, well-shaped tasks don't need a frontier model. We run a small, fast model for the extraction and patching work, and the AI Gateway means swapping models is a one-line change to a provider/model string. So you can route the cheap, high-volume calls to a small model and save the expensive one for the rare step that genuinely needs more reasoning. Most of building affordable AI tooling is just refusing to use a big model for a small job.

What changed when we automated

Before the pipeline, comparison pages drifted. A wrong price could sit on a page for a quarter before someone noticed, because checking them by hand was nobody's favourite job. Now the stalest page is never more than a week from a fresh review, and the corrections arrive as a tidy diff with sources attached.

The numbers that moved:

  • Staleness: from a quarter of drift down to a seven-day ceiling.
  • Time to a correction PR: minutes of compute, not an afternoon of someone's week.
  • Human review: still 20 to 30 minutes per change, and that's fine. Review is the part we want a person on.

The constraint was never how fast we could type. It was the dozen unloved steps between "this page is probably wrong now" and "the fix is live." Automate those, keep a person on the judgement, and the work gets done.

Want a content pipeline like this?
We build agentic workflows that automate the repetitive parts of your business, from content to operations, with the model kept on a tight leash.
See our agentic workflow services

When content automation doesn't work

Not every content problem is a pipeline problem. Automation pays off when:

  • The process is repetitive and follows a pattern.
  • Quality can be written as rules, not just "make it good."
  • The content is informational, and you have clean data inputs.

It struggles with:

  • Thought leadership that needs lived experience or original research.
  • Content built on interviews, quotes, or primary reporting.
  • Pieces where your angle is the value, not the information.

We still write plenty by hand. The pipeline handles the factual upkeep and the SEO-driven informational gaps. The opinions and the deep dives come from people.

Getting started

You don't need the full system on day one. Start with the step that hurts most:

  1. If research is the bottleneck, schedule a weekly keyword-gap pull and output a scored shortlist. No drafting yet, just data.
  2. If formatting is the bottleneck, template your frontmatter so one command scaffolds a complete file with valid metadata.
  3. If upkeep is the bottleneck, point a scheduled agent at one rotting page type, give it a tool() that can only propose draft edits, and make it cite every change.

Each of those is a weekend project. Add a schema, a tool, and one eval, and you have the start of a pipeline you can trust.

No spam, only good stuff

Subscribe, for more hot takes

Only god knows why anybody would purposefully subscribe themselves to a newsletter that moans about development. These poor souls did though
Profile 1
Profile 2
Profile 3
Profile 4
Profile 5

The line is the craft

The point of content automation is to remove friction from work that's already good when a person does it carefully, so the mechanical parts happen without anyone context-switching into admin mode. Volume was never the goal.

The reason our scheduled agent doesn't produce slop has nothing to do with a clever prompt. It's that the model only ever answers small, well-shaped questions, the SDK rejects the answers that don't fit, and a human owns the one button that matters. Draw that line in the right place, let the tooling enforce it, and automated content can read like someone sat down and wrote it. Because for the parts that count, someone did.

Frequently asked questions

What is content automation?
Content automation uses AI models and scripts to handle the repeatable steps in your content workflow: research, drafting, fact-checking, formatting, and publishing. Done well, it removes manual handoffs without removing human judgement. Done badly, it generates plausible-sounding slop at scale.
Why does automated content usually read like slop?
Because the model is allowed to make decisions that should have been rules. When a language model picks the topic, decides what's true, and publishes the result with no constraints, you get generic, sometimes wrong content. The fix is to move those decisions out of the prompt and into code the model can't override.
What's the difference between deterministic and non-deterministic steps?
Deterministic steps produce the same result every time: a scheduler firing, a file being picked by timestamp, a validation gate rejecting a malformed edit. Non-deterministic steps are the model's judgement: reading a source, deciding what changed, writing a sentence. A good pipeline runs the model inside deterministic rails.
What tools do you need to build a content automation pipeline?
We build on the Vercel AI SDK for orchestration and Zod for the schemas that constrain the model, with the AI Gateway for model routing. Add a research data source (we use the Ahrefs API), a file-based content system (MDX in a Git repo), and a notification channel (Slack) for review triggers.
Can you fully automate publishing?
You can automate roughly everything up to publish: research, drafting, fact-checking, formatting, and opening a pull request. We deliberately keep the publish step human. The model can write anything, but it can't flip a draft live. That one rule catches most of what would otherwise go wrong.

About the Author

Jono Alford

Founder of Roboto Studio, specializing in headless CMS implementations with Sanity and Next.js. Passionate about building exceptional editorial experiences and helping teams ship faster.

Get in touch

Tell us what you're building. We reply within one working day — Jono or someone on the team picks up every message personally.