A bot that writes on a schedule sounds like a slop machine. Set it running, walk away, come back to a hundred posts that read like the same press release. That fear is reasonable. Most automated content is bad.
We run a scheduled agent anyway, and the corrections it ships are more reliable than the ones we used to make by hand. The difference is one rule: the model never makes a decision that should be a rule. Everything in this post is an argument for that line, plus the parts that matter once you start building real tooling: how to enforce the line in code, how to test it, and what to do when it breaks. We build all of this on the Vercel AI SDK, and we'll show you why.
First, the framing most teams get wrong. Content automation is not "let AI write your blog posts." Writing is maybe 30% of the time cost of publishing. The other 70% is research, briefs, formatting, fact-checking, SEO checks, and review. Those steps are mechanical and predictable, and they're where content dies, because nobody wants to spend Tuesday afternoon copying metadata into form fields.
Slop is a determinism problem
Here's a definition worth keeping: slop is what you get when a stochastic model makes a decision that should have been deterministic.
Why a language model is the wrong place for a rule
A language model is a probability machine. Ask it the same question twice and you can get two different answers. That variance is a feature when you want it to phrase a sentence, and a liability when you want it to decide whether a competitor's price changed. Point that randomness at a factual or structural decision and you've built a slop generator. Not because the model is dumb, but because you gave a dice-roll a job that needed a rule.
The trap is that the model is good enough to fool you. It will confidently pick a topic, assert a price, and write 800 words around it. Nothing in the output says "I guessed." So the failure is silent, and it scales. One bad post is an embarrassment. A pipeline of them is a reputation.
The decision-sorting exercise
Before you write a line of agent code, list every decision in the workflow and sort each one into two piles: rule or judgement.
"Which page do we update this week?" is a rule (the stalest one). "Did this vendor change their pricing?" is judgement. "Is the new price formatted correctly?" is a rule. "How do we phrase the correction?" is judgement. Do this honestly and the architecture falls out of it: the rules become code, the judgement calls go to the model, and the model runs inside the rules rather than above them.
Most teams skip this step and hand the whole workflow to one big prompt. That's the original sin of bad AI tooling.
The two lanes of a content pipeline
Every pipeline has two lanes running side by side. One is deterministic and owns control flow. The other is the model, doing the parts that genuinely need reading and writing.
The model sits inside the rail. It receives a tightly scoped task and hands back a structured proposal. It never decides when to run, which page to touch, or whether the result goes live. The rail calls the model the way your code calls any other unreliable dependency: with a timeout, a schema on the response, and a plan for when it returns nonsense.
A worked example: keeping comparison pages honest
Here's a real one. We publish comparison pages that help people pick between content platforms before a migration. Vendors change pricing, rename tiers, and ship features constantly, so comparison pages rot. A page that was accurate in March quietly lies by June.
What the agent does each week
So we run a small agent on a weekly schedule whose entire job is to keep those pages current. Each run it does the same five things, in the same order:
- Finds the single most out-of-date comparison, picked by timestamp.
- Reads the current page so it knows what we already claim.
- Fetches the vendor's pricing page, changelog, and a few community threads.
- Proposes specific field changes, each with a cited source quote.
- Opens a draft pull request and pings a human in Slack.
Steps 1, 2, and 5 are deterministic plumbing. Steps 3 and 4 are the model's job. Nothing about that order is left to the model's discretion, which is the point.
Where the decisions land
| The pipeline decides (rules in code) | The model decides (judgement) |
|---|---|
| When it runs (a timer, not a mood) | What a vendor's page says now |
| Which page: the stalest, picked by timestamp | Whether a fact genuinely changed |
| That edits are surgical field patches, never rewrites | A confidence level for each change |
| That every claim carries a source quote or it's dropped | Which sources are worth trusting |
| That tone is off-limits: facts can change, opinions can't | The wording of a corrected fact |
| That a no-op run opens nothing | |
| That it can touch only those files | |
| That a human publishes, never the agent |
Every row in the left column is a decision we refused to delegate. The model is strong at "read this pricing page and tell me what the Team plan costs now." It's unreliable at "decide which of forty pages to edit and push it live." So we only ask it the first kind of question.
Building the rails with the Vercel AI SDK
This is where the Vercel AI SDK earns its place. We've tried the heavier agent frameworks, and we keep coming back to the AI SDK for one reason: it makes the deterministic boundary a type, not a hope. If you're building AI tooling and you want one recommendation from this post, it's this stack. Here's how the rails get built.
Make the model answer in a shape
Don't ask the model "did the price change?" and parse prose. Make it fill a schema. Structured output turns a judgement into a shape, and a malformed or hedging answer can't get through, because it won't match the schema.
That sourceQuote: z.string().min(1) is doing real work. A finding without a cited snippet can't be represented, so the model can't propose a change it can't back up. The schema, not the prompt, enforces "cite or stay quiet." This is the single most useful habit in building AI tooling: every time you're tempted to write "and make sure you..." in a prompt, ask whether it should be a field in a schema instead.
Give it tools, not file access
The model never touches a file directly. It calls a tool you defined, and the tool's inputSchema validates every argument before your code runs.
The model fills in value. The schema decides whether that value is even allowed to exist. Two details matter when you design tools this way. The description is part of the prompt, so write it like an instruction: ours says "cannot create new fields" because the model needs to know the boundary, not just hit it. And keep each tool narrow. A proposeEdit that patches one field is easy to validate and hard to misuse. A doEverything tool is neither.
Wire the tools into a ToolLoopAgent and the SDK runs the read-decide-act loop for you. The model's reach is still bounded by exactly the tools you handed it and nothing else.
Let the SDK reject bad calls
You don't have to write the validation glue yourself. If the model hallucinates a tool that doesn't exist, the SDK raises NoSuchToolError. If it calls a real tool with arguments that don't match the schema, you get an InvalidToolInputError, and your execute function never runs. The bad call is caught at the door.
That's a free guardrail, and it's a big part of why we don't roll our own agent loop. The schema you wrote for documentation is also the schema that enforces correctness at runtime. You define the boundary once and the SDK polices it on every call.
Keep the irreversible action human
One rule sits outside the SDK and matters more than any of them: every change lands as a draft. The pipeline owns the single irreversible action, publishing, and it never lets the model take it.
We proved this works the hard way recently. We deleted six auto-generated drafts in one go because they were generic and thin. The system did its job: the slop got caught at the draft gate, not on the live site. When you're deciding which decisions to keep human, start with the ones you can't take back.
Prove it works, then prove it keeps working
A pipeline you can't test is a pipeline you can't trust. But you can't unit-test prose, and that confuses people into testing nothing. So they ship an agent, watch it work once, and assume it'll keep working. It won't. Prompts drift, models get swapped, a source changes its HTML, and the thing that worked in May quietly degrades by July.
The trick is to test the behaviour, not the writing. We run a small eval suite that checks the agent reaches for the right tools and stops, not that the sentences are good. One case reads, in plain terms: "given a normal week, the agent should pick the stalest page, read it, and make no more than a couple of tool calls before proposing." If a prompt change makes the agent start thrashing or skip the read step, that eval fails, and we find out before a bad PR does.
Think of evals as a regression net for the rails, not a quality score for the content. They answer "is the machine still behaving the way we designed it," which is a deterministic question with a deterministic answer. The quality of the prose is a separate problem, and that one still belongs to a human reviewer.
When things fail
Real AI tooling spends most of its code on the unhappy path. The model returns garbage, a fetch times out, a vendor page is behind a login. Plan for it.
Failing closed
The default posture is to do nothing rather than guess. If the researcher can't cite a source, the finding is dropped. If every finding gets dropped, the run ends and opens no pull request. An empty diff is a success, not a failure: it means the page is already correct. The worst thing a content pipeline can do is generate output to justify having run, so we made silence the easy path.
Retries and roundtrips
The SDK handles the transient failures so you don't have to hand-roll them. Failed model calls retry automatically, twice by default, configurable with maxRetries. When your execute function throws, the SDK turns the error into a tool-error the model can see on its next step, so a recoverable problem (a 404 on one source) becomes a roundtrip rather than a crash.
Structured output gives you the same safety on the way out. If the model genuinely can't produce something that fits the schema, the call throws instead of handing you a half-formed object. You get a loud failure you can catch, not a quiet one that ships. Across the whole pipeline the rule is the same: surface the error, fail closed, and never let a degraded run masquerade as a good one.
Where we keep the model free
Constraints stop slop, but over-constraint creates a different problem. If you template every sentence, you get content that reads like a form letter, which is its own kind of slop.
So we leave the genuinely hard parts loose. Reading a messy pricing page and working out what genuinely changed. Judging whether a Reddit thread reflects a real pain point or one annoyed user. Phrasing a correction so it stays sharp instead of going limp. That's judgement, and judgement is what the model is for. Forcing it into a rigid template there would make the output worse, not safer.
The skill is knowing which lane each decision belongs in. Pricing accuracy is a rule. The sentence that states the price is judgement. Keep those separate and you get reliability and readability at the same time.
The determinism dial
Deterministic versus non-deterministic isn't a switch, it's a dial, and the AI SDK gives you the knobs to set it per call.
The knobs
temperatureturns the randomness up or down. Low for extraction, higher for drafting.Output.object()versus free text decides whether an answer must fit a shape or can roam.tool()validation decides what the model is physically able to do.stopWhendecides how long the loop is allowed to run before you pull it out.
You set these per task, not per project. The same agent can extract a price at temperature zero and draft a sentence at temperature 0.7 in the same run.
Cheap model, narrow task
Narrow, well-shaped tasks don't need a frontier model. We run a small, fast model for the extraction and patching work, and the AI Gateway means swapping models is a one-line change to a provider/model string. So you can route the cheap, high-volume calls to a small model and save the expensive one for the rare step that genuinely needs more reasoning. Most of building affordable AI tooling is just refusing to use a big model for a small job.
What changed when we automated
Before the pipeline, comparison pages drifted. A wrong price could sit on a page for a quarter before someone noticed, because checking them by hand was nobody's favourite job. Now the stalest page is never more than a week from a fresh review, and the corrections arrive as a tidy diff with sources attached.
The numbers that moved:
- Staleness: from a quarter of drift down to a seven-day ceiling.
- Time to a correction PR: minutes of compute, not an afternoon of someone's week.
- Human review: still 20 to 30 minutes per change, and that's fine. Review is the part we want a person on.
The constraint was never how fast we could type. It was the dozen unloved steps between "this page is probably wrong now" and "the fix is live." Automate those, keep a person on the judgement, and the work gets done.
When content automation doesn't work
Not every content problem is a pipeline problem. Automation pays off when:
- The process is repetitive and follows a pattern.
- Quality can be written as rules, not just "make it good."
- The content is informational, and you have clean data inputs.
It struggles with:
- Thought leadership that needs lived experience or original research.
- Content built on interviews, quotes, or primary reporting.
- Pieces where your angle is the value, not the information.
We still write plenty by hand. The pipeline handles the factual upkeep and the SEO-driven informational gaps. The opinions and the deep dives come from people.
Getting started
You don't need the full system on day one. Start with the step that hurts most:
- If research is the bottleneck, schedule a weekly keyword-gap pull and output a scored shortlist. No drafting yet, just data.
- If formatting is the bottleneck, template your frontmatter so one command scaffolds a complete file with valid metadata.
- If upkeep is the bottleneck, point a scheduled agent at one rotting page type, give it a
tool()that can only propose draft edits, and make it cite every change.
Each of those is a weekend project. Add a schema, a tool, and one eval, and you have the start of a pipeline you can trust.
No spam, only good stuff
Subscribe, for more hot takes
The line is the craft
The point of content automation is to remove friction from work that's already good when a person does it carefully, so the mechanical parts happen without anyone context-switching into admin mode. Volume was never the goal.
The reason our scheduled agent doesn't produce slop has nothing to do with a clever prompt. It's that the model only ever answers small, well-shaped questions, the SDK rejects the answers that don't fit, and a human owns the one button that matters. Draw that line in the right place, let the tooling enforce it, and automated content can read like someone sat down and wrote it. Because for the parts that count, someone did.

