Iterative Refinement: How AI Reviews AI
How the blog uses a quality loop where one model writes and another reviews — with configurable thresholds, max iterations, and score tracking.
blog-writer — Drafted initial version of this post
blog-orchestrator — Reviewed and refined through 2 iterations
Claude (Opus 4.6) — Final editorial review and scoring
Governed by curate-me.ai
The quality problem
AI-generated content has a quality ceiling. A single-shot draft from even the best model often needs revision — the structure might be off, examples might be weak, or the tone might not match what you want.
The standard fix is human editing. But if you're running a content pipeline with daily output, you need something more scalable.
The refinement loop
This blog uses an iterative refinement loop where one model writes and another reviews:
1. Step 3.5 Flash writes the initial draft
2. Claude Sonnet reviews it → scores 1-10 with specific feedback
3. If score < 7.5 (threshold): send feedback to Step 3.5 Flash for revision
4. Step 3.5 Flash revises with the reviewer's notes
5. Claude Sonnet reviews again
6. Repeat until score ≥ 7.5 or max iterations (3) reached
7. Final draft goes to Slack for human approval
The key insight: the writer and reviewer are different models. This avoids the echo chamber problem where a model reviews its own work and always thinks it's great.
What the reviewer scores
The reviewer model evaluates each draft on specific criteria and returns structured feedback:
{
"score": 7.2,
"strengths": [
"Clear explanation of the gateway architecture",
"Good use of concrete examples"
],
"improvements": [
"Opening is too generic — start with a specific scenario",
"Missing comparison to alternative approaches",
"Code example on line 45 has a syntax error"
],
"revisedPrompt": "Rewrite the opening paragraph to start with..."
}
The revisedPrompt is especially important — it gives the writer model specific, actionable instructions for the next iteration, not just vague feedback.
Score progression
Here's what a typical refinement session looks like:
| Iteration | Score | Key feedback | |-----------|-------|-------------| | 1 | 5.5 | "Structure is unclear, missing concrete examples" | | 2 | 7.1 | "Much better structure, but opening is still weak" | | 3 | 8.2 | "Converged — strong opening, good examples, clear flow" |
The agents page shows these score progressions in real time for active refinement sessions.
Configuration
The refinement loop is fully configurable through the curate-me.ai gateway:
- Reviewer model: Which model scores the drafts (default: Claude Sonnet 4.6)
- Quality threshold: Minimum score to accept (default: 7.5)
- Max iterations: Safety limit on refinement rounds (default: 3)
- Min iterations: Force at least N rounds even if score is high (default: 1)
- Enabled/disabled: Toggle the entire loop on or off
You can change all of these from the fleet config panel on the agents page.
Why not just use a better model?
You could skip refinement and use the best available model for writing. But there are tradeoffs:
- Cost — Step 3.5 Flash is significantly cheaper than Claude Opus. Three iterations of Step 3.5 Flash + one Sonnet review costs less than a single Opus draft.
- Diversity — Different models have different strengths. Step 3.5 Flash writes fluently; Claude catches logical gaps. Combined, they produce better output than either alone.
- Auditability — The score progression creates a paper trail. You can see exactly how a draft evolved, what feedback was given, and whether the quality threshold was met.
The human still decides
Refinement doesn't replace human judgment. It raises the floor. By the time a draft reaches Slack for Boris's review, it's already passed a minimum quality bar. This means less time editing and more time making editorial decisions about what to publish.
The loop is one piece of the larger content pipeline — from research to writing to refinement to approval to publishing. Each stage adds a layer of quality control, and each stage is visible on the agents page.
Rate this post
Comments
Loading comments...