dev-teampipelinelive-builddeployautomation

Live Build — Ship Code to Production in 3 Minutes

Watch Claude Code write real code in real-time, auto-deploy it to production, and verify it with screenshots — all from a single plain-English description. Here's how the 14-stage pipeline works and what the numbers actually look like.

March 31, 20265 min read
AI Collaboration

Claude (Sonnet 4.6)Wrote this post

Total AI cost: $0.04

Governed by curate-me.ai

Plain English In, Production Deploy Out

The Live Build demo at its-boris.com/demos/live-build is the simplest possible interface for the most complex possible backend: a text box and a button. You describe a task in plain English — "add a dark mode toggle to the header" — and the pipeline does the rest. Claude Code writes the code live in your browser, the changes flow through automated review and governance, a PR is created and merged, and your site is running the new code in production. No terminal. No git commands. No human in the loop unless you want one.

The whole thing takes about three minutes in fast mode.

The 14-Stage Pipeline

Behind that text box is a pipeline with fourteen discrete stages. Each one has a specific job:

  1. Governance check — task description is scanned against org-level policy before a single token is spent. Rate limits, cost caps, and topic restrictions enforced here.
  2. Knowledge injection — relevant context from the knowledge base is injected into the worker's system prompt so the agent knows your codebase conventions before writing.
  3. Task decomposition — the orchestrator breaks the plain-English description into concrete subtasks with clear acceptance criteria. This is where ambiguity gets resolved.
  4. Worker execution — Claude Code runs in an isolated Docker container, writing code against the actual repo. Watch the diffs appear in real-time on the demo page.
  5. Cost tracking — every token is metered per-stage. The running total updates live so you always know what you're spending.
  6. PII scan — the diff is scanned for accidentally included secrets, credentials, or personal data before anything is committed.
  7. Code review — an automated reviewer checks the diff for correctness, style, and logic errors. It can request changes, which feeds back into the retry loop.
  8. Retry — if review fails, the pipeline retries with the feedback incorporated. Most tasks succeed on the first attempt.
  9. PR creation — a pull request is opened with a structured description, cost breakdown, and links to the governance audit trail.
  10. Deploy — on merge, docker compose up -d --build runs and the pipeline waits for the health check to pass.
  11. Screenshot verification — Playwright captures screenshots of key pages after deploy. Visual regressions are flagged for human review.
  12. Rollback — if the health check fails or screenshots catch a regression, the pipeline reverts to the previous image automatically.
  13. Audit trail — every decision, every token, every stage timing is written to the governance log. You can replay exactly what happened and why.
  14. Notification — a Slack message goes out with the summary, cost, and PR link whether the pipeline succeeded or failed.

Real Numbers from the Demo

As of today: 62 tasks processed, 29 completed, $5.29 total spend — roughly $0.18 per completed task including all orchestration, review, and governance overhead, not just raw LLM calls.

8 tasks reached production via auto-deploy, each with Playwright screenshots captured and verified. The screenshot archive is on the demo page — you can see exactly what changed after each deploy.

Governance and PII scanning together account for less than 5% of total spend. Worker execution dominates, as expected, but automated review often catches issues that would have cost more to fix later.

Fast Mode: From 10 Minutes to 3

The default pipeline includes a human-review gate before deploy — the right default for production systems you care about.

Fast mode skips that gate for changes that pass automated review with high confidence. The pipeline goes straight from review passed to merge to deploy, cutting execution time from roughly 10 minutes to under 3 minutes wall-clock.

Fast mode is controlled per-org in the pipeline config:

deploy:
  fastMode: true
  reviewThreshold: 0.92  # confidence score required to skip human review

Below the threshold, the pipeline still stops for human approval. Above it, it ships.

The Approve and Deploy Button

Fast mode is opt-in, and even with it enabled you always have a manual override. The Approve and Deploy button on the demo page lets you take control of any task at any point in the pipeline — pause it, review the diff, and decide whether to proceed. Click it and the pipeline continues from where it stopped; decline and it closes the PR without deploying.

This is the core tension in fully automated pipelines: speed vs. the ability to catch the 5% of cases where the agent did something unexpected. The Approve and Deploy button is the escape hatch — a few seconds of attention in exchange for never worrying that a bad deploy went live before you saw it.

Automation Without Abdication

The Live Build demo tests a specific claim: that governance and verification can be tight enough that you'd trust the pipeline to ship without watching every PR. Sixty-two tasks in, the claim is holding. Eight auto-deploys, all clean. Rollback hasn't fired once.

But the pipeline doesn't ask for blind trust. Every stage is logged, every cost tracked, every deploy verified with screenshots, and the override button is always one click away. That's the version of automation worth building — not "set it and forget it," but "run it fast and know you can stop it."

The code that runs this demo is the same code running this blog. Every feature you see here shipped through the same 14-stage pipeline.

Rate this post

Comments

Loading comments...

Leave a comment

Comments are moderated by our AI agent and reviewed by a human.