dev-teampipelineclaude-codeautomationdeployci-cddocker

From GitHub Issue to Production Deploy in 3 Minutes — Building an AI Dev Team Pipeline

How we built a fully automated pipeline where a GitHub issue triggers AI decomposition, Docker containers running Claude Code Opus write the code, automated review catches bugs, and a configurable deploy pipeline pushes to production. Real numbers: PR #1235 in 179 seconds, $0.22 total cost.

March 24, 20269 min read
AI Collaboration

Claude (Opus 4.6)Co-architect, implemented pipeline executor, pipeline steps, per-org config, deploy templates

Claude Code (Sonnet 4.6)First successful autopilot PR — 2 lines of code, sidebar fix

Total AI cost: $0.22

Governed by curate-me.ai

The pitch that writes itself

What if you could point your AI dev team at a GitHub issue and walk away? Not "generate some boilerplate" — the full loop: decompose the task, spin up isolated Docker containers, write production code with Claude Code, run automated review, merge the PR, deploy to your VPS, verify the health check passes, close the issue, and notify Slack. All from a single button click on a dashboard.

We built that. And then we used it on itself.

The first successful PR

179s
Time
issue to PR
$0.22
Cost
total pipeline
2
Lines Changed
sidebar fix
0
Human Intervention
fully autonomous

PR #1235 was small on purpose. A sidebar navigation link was missing from the dev-team settings page. The issue described it in one sentence. The autopilot decomposed it, spun up a Docker container with Claude Code Sonnet, found the file, added two lines, pushed a branch, created the PR, and posted the review — all in under three minutes.

The cost? $0.22. And that was before we figured out the Max subscription routing trick that brought it to $0.

How the pipeline actually works

0sIssue picked upDashboard shows GitHub board issues. Click 'Start' on any Ready issue.
2sTask decomposedHaiku analyzes the issue body, breaks it into subtasks with file paths and roles.
5sContainer startedDocker container spins up with Claude Code CLI, repo cloned, branch created.
10-180sCode writtenClaude Code Opus/Sonnet writes code inside the container. VNC streaming available for live watching.
180sPR createdGit push, PR opened against base branch, review comment posted on the issue.
185sReview runsConfigurable reviewer: LLM (Haiku), Claude Code Opus, or Codex GPT-5.4.
190sDeploy pipelineIf review approves: merge PR, SSH deploy, health check, close issue, Slack notification.

Each step is its own module. The decomposer is an LLM call. The worker is a Docker container. The reviewer is pluggable. The deploy pipeline is configurable per-org with 9 step types. Nothing is hardcoded to our setup.

The $0 cost trick

This is the part that surprised us. We expected to spend $5-10 per task on LLM calls. Instead we spent $0.

Cost routing strategy

ComponentModelRouteCost
DecomposerHaiku 4.5Gateway → Max subscription$0.00
WorkerSonnet/Opus 4.6Claude Code CLI → OAuth token → api.claude.ai$0.00
Reviewer (LLM)Haiku 4.5Gateway → Max subscription$0.00
Reviewer (Codex)GPT-5.4Codex CLI → ChatGPT Plus$0.00
Reviewer (Claude)Opus 4.6Claude Code CLI → OAuth token$0.00

The key discovery: Claude Code CLI detects OAuth tokens (prefix sk-ant-oat01-*) and routes them to api.claude.ai — the consumer API that supports Sonnet and Opus under the Max subscription. The raw Anthropic API (api.anthropic.com) only supports Haiku with OAuth tokens. This one prefix check is the difference between $0 and $5 per task.

If you're running Claude Code in containers, pass the OAuth token as ANTHROPIC_API_KEY environment variable — don't mount auth.json. We learned this the hard way three separate times before creating a permanent memory note about it.

What went wrong (and what we learned)

The OpenRouter cost leak

We were routing decomposition calls through OpenRouter at $0.03 per call. Not a lot, but it adds up when you're iterating. The fix was switching to gateway_anthropic mode — the Anthropic SDK pointed at our own gateway, which routes through the Max subscription.

But the Anthropic SDK appends /v1/messages to whatever base URL you give it. So gateway_url/v1/anthropic became gateway_url/v1/anthropic/v1/messages — a 404. The fix: set base_url to the gateway root, not the anthropic path.

Then another bug: the gateway's extract_provider_auth() was forwarding our gateway API key to Anthropic as the provider key. Anthropic rejected it. Fix: use a placeholder api_key="cm-gateway-passthrough" that the gateway recognizes and strips.

Three bugs, three layers of indirection. This is the kind of thing you don't notice until you have to change it. And then you notice it hard.

The Docker-in-Docker auth problem

Our worker containers are launched from inside the B2B API container. Docker volume mounts reference the host filesystem, not the parent container's filesystem. So -v ./auth.json:/home/user/.claude/auth.json maps to... nothing.

We fixed it with an explicit host path: /home/curateme/.config/claude-code/auth.json. But then Claude Code still said "Not logged in" because the auth.json file format changed between versions.

The real fix was simpler: read the OAuth token from auth.json in the parent process and pass it as ANTHROPIC_API_KEY. No mount needed. Claude Code reads the env var, sees the sk-ant-oat01-* prefix, and routes to the consumer API.

We made this exact mistake three times across different parts of the codebase. The lesson: don't mount files into Docker containers when you can pass environment variables. Mounts are fragile across DinD, permissions, and file format changes.

The model that doesn't exist via OAuth

We set the default model to Sonnet for the decomposer. It worked locally. In production, the Anthropic API returned 400. Why? OAuth tokens on api.anthropic.com only support Haiku. Sonnet and Opus are only available through api.claude.ai (the consumer API path that Claude Code CLI uses).

The fix: decomposer and reviewer use Haiku (fast, free via Max sub). Workers use Claude Code CLI with Opus/Sonnet (also free via Max sub, but through the consumer API).

The configurable deploy pipeline

After the PR is reviewed and approved, what happens next? For us: merge, SSH deploy, health check, close issue. For a customer on Vercel: merge, webhook deploy, health check, close issue. For someone on GitHub Actions: merge, dispatch workflow, close issue.

Pipeline step types (9 built-in)

merge_pr1Squash merge via GitHub API
deploy_ssh1SSH command to any host
deploy_webhook1Vercel, Railway, custom
deploy_github_action1Dispatch any workflow
verify_health1Poll endpoint until 200
close_issue1Close + move board to Done
notify_slack1Deploy notification
notify_webhook1Custom webhook
custom_script1Run any shell command

Each step is data (stored in MongoDB), not code. An org configures their pipeline once in the dashboard settings, and every approved PR flows through it automatically. Steps have configurable failure modes: stop the pipeline, warn and continue, or silently continue.

We shipped four pre-built templates: VPS SSH (our setup), Vercel, GitHub Actions, and Blog SSH (for its-boris.com). Creating a custom pipeline is drag-and-drop in the settings page.

Per-org multi-repo support

The original system was hardcoded to Curate-Me-ai/platform. That's fine when you're the only user. But we have three repos now:

Configured repositories

OrgRepoBoardDeploy Target
Curate-MeCurate-Me-ai/platformProject #1 (5 columns)VPS SSH + health check
Its Boris Blogits-boris/blogProject #1 (3 columns)SSH to /opt/its-boris-blog
Margin Mandyits-boris/marginmandyProject #2 (3 columns)SSH to /opt/marginmandy

Each org stores its GitHub config in MongoDB: owner, repo, project board ID, status field IDs, and board column option IDs. The board issues endpoint dynamically switches between organization and user GraphQL queries based on an is_user flag — because GitHub's API uses different query roots for org-owned vs user-owned projects.

This was one of those changes where the diff looks small but the blast radius is huge. Every endpoint that touches GitHub issues — from-issue, mark-done, deploy, board/issues — had to be updated to thread the org config through.

The pluggable engine architecture

Not every task needs Opus. Not every review needs a container.

Worker and reviewer engines

EngineTypeBest ForCost
Claude CodeContainerComplex multi-file changes$0 (Max sub)
CodexContainerLarge codebase refactors$0 (ChatGPT Plus)
AiderContainerQuick fixes, pair programmingVaries
Gemini CLIContainerGoogle ecosystem workVaries
Copilot CLIContainerGitHub-native workflowsVaries
LLM (Haiku)SDK callFast decomposition and review$0 (Max sub)
Codex ReviewContainerGPT-5.4 code review$0 (ChatGPT Plus)
Claude Code ReviewContainerOpus-level code review$0 (Max sub)

Each engine implements WorkerEngineProtocol: install_script(), invoke_command(), auth_env_vars(). Adding a new engine is one class and one registry entry. The reviewer uses a similar factory pattern — get_reviewer(engine, model) returns the right implementation.

Per-org defaults are configurable from the dashboard settings page. Per-task overrides are supported for when you want Opus on a complex task but Sonnet on a quick fix.

What's next

The pipeline works end-to-end for the Curate-Me platform. We just hooked up the its-boris.com blog and Margin Mandy. The container image needs work — the OpenClaw gateway startup blocks the Claude Code CLI, which means the first few container tasks fail until we fix the entrypoint.

But the architecture is right. The pipeline is hookable. The engines are pluggable. The deploy steps are configurable. And the whole thing costs $0 on a Max subscription.

The uncomfortable question: if an AI can go from a one-sentence GitHub issue to a deployed production fix in three minutes for free, what does that mean for how we think about task prioritization? We used to batch small fixes because the context-switching cost was too high. Now the cost is zero. Every "we'll get to it later" item on the backlog is a candidate for autonomous resolution.

We're not there yet — the success rate on complex tasks is still low, and the container startup needs to be more reliable. But for the simple fixes that make up 60% of any backlog? The pipeline is ready.

Everything described here is running in production at dashboard.curate-me.ai. The deploy pipeline, per-org config, and engine selection are all available in the dev-team settings page. If you're interested in hooking up your own repo, the setup takes about 10 minutes — create an org, configure the GitHub board IDs, and pick a deploy template.

Rate this post

Comments

Loading comments...

Leave a comment

Comments are moderated by our AI agent and reviewed by a human.