From GitHub Issue to Production Deploy in 3 Minutes — Building an AI Dev Team Pipeline
How we built a fully automated pipeline where a GitHub issue triggers AI decomposition, Docker containers running Claude Code Opus write the code, automated review catches bugs, and a configurable deploy pipeline pushes to production. Real numbers: PR #1235 in 179 seconds, $0.22 total cost.
Claude (Opus 4.6) — Co-architect, implemented pipeline executor, pipeline steps, per-org config, deploy templates
Claude Code (Sonnet 4.6) — First successful autopilot PR — 2 lines of code, sidebar fix
Governed by curate-me.ai
The pitch that writes itself
What if you could point your AI dev team at a GitHub issue and walk away? Not "generate some boilerplate" — the full loop: decompose the task, spin up isolated Docker containers, write production code with Claude Code, run automated review, merge the PR, deploy to your VPS, verify the health check passes, close the issue, and notify Slack. All from a single button click on a dashboard.
We built that. And then we used it on itself.
The first successful PR
PR #1235 was small on purpose. A sidebar navigation link was missing from the dev-team settings page. The issue described it in one sentence. The autopilot decomposed it, spun up a Docker container with Claude Code Sonnet, found the file, added two lines, pushed a branch, created the PR, and posted the review — all in under three minutes.
The cost? $0.22. And that was before we figured out the Max subscription routing trick that brought it to $0.
How the pipeline actually works
Each step is its own module. The decomposer is an LLM call. The worker is a Docker container. The reviewer is pluggable. The deploy pipeline is configurable per-org with 9 step types. Nothing is hardcoded to our setup.
The $0 cost trick
This is the part that surprised us. We expected to spend $5-10 per task on LLM calls. Instead we spent $0.
Cost routing strategy
| Component | Model | Route | Cost |
|---|---|---|---|
| Decomposer | Haiku 4.5 | Gateway → Max subscription | $0.00 |
| Worker | Sonnet/Opus 4.6 | Claude Code CLI → OAuth token → api.claude.ai | $0.00 |
| Reviewer (LLM) | Haiku 4.5 | Gateway → Max subscription | $0.00 |
| Reviewer (Codex) | GPT-5.4 | Codex CLI → ChatGPT Plus | $0.00 |
| Reviewer (Claude) | Opus 4.6 | Claude Code CLI → OAuth token | $0.00 |
The key discovery: Claude Code CLI detects OAuth tokens (prefix sk-ant-oat01-*) and routes them to api.claude.ai — the consumer API that supports Sonnet and Opus under the Max subscription. The raw Anthropic API (api.anthropic.com) only supports Haiku with OAuth tokens. This one prefix check is the difference between $0 and $5 per task.
If you're running Claude Code in containers, pass the OAuth token as ANTHROPIC_API_KEY environment variable — don't mount auth.json. We learned this the hard way three separate times before creating a permanent memory note about it.
What went wrong (and what we learned)
The OpenRouter cost leak
We were routing decomposition calls through OpenRouter at $0.03 per call. Not a lot, but it adds up when you're iterating. The fix was switching to gateway_anthropic mode — the Anthropic SDK pointed at our own gateway, which routes through the Max subscription.
But the Anthropic SDK appends /v1/messages to whatever base URL you give it. So gateway_url/v1/anthropic became gateway_url/v1/anthropic/v1/messages — a 404. The fix: set base_url to the gateway root, not the anthropic path.
Then another bug: the gateway's extract_provider_auth() was forwarding our gateway API key to Anthropic as the provider key. Anthropic rejected it. Fix: use a placeholder api_key="cm-gateway-passthrough" that the gateway recognizes and strips.
Three bugs, three layers of indirection. This is the kind of thing you don't notice until you have to change it. And then you notice it hard.
The Docker-in-Docker auth problem
Our worker containers are launched from inside the B2B API container. Docker volume mounts reference the host filesystem, not the parent container's filesystem. So -v ./auth.json:/home/user/.claude/auth.json maps to... nothing.
We fixed it with an explicit host path: /home/curateme/.config/claude-code/auth.json. But then Claude Code still said "Not logged in" because the auth.json file format changed between versions.
The real fix was simpler: read the OAuth token from auth.json in the parent process and pass it as ANTHROPIC_API_KEY. No mount needed. Claude Code reads the env var, sees the sk-ant-oat01-* prefix, and routes to the consumer API.
We made this exact mistake three times across different parts of the codebase. The lesson: don't mount files into Docker containers when you can pass environment variables. Mounts are fragile across DinD, permissions, and file format changes.
The model that doesn't exist via OAuth
We set the default model to Sonnet for the decomposer. It worked locally. In production, the Anthropic API returned 400. Why? OAuth tokens on api.anthropic.com only support Haiku. Sonnet and Opus are only available through api.claude.ai (the consumer API path that Claude Code CLI uses).
The fix: decomposer and reviewer use Haiku (fast, free via Max sub). Workers use Claude Code CLI with Opus/Sonnet (also free via Max sub, but through the consumer API).
The configurable deploy pipeline
After the PR is reviewed and approved, what happens next? For us: merge, SSH deploy, health check, close issue. For a customer on Vercel: merge, webhook deploy, health check, close issue. For someone on GitHub Actions: merge, dispatch workflow, close issue.
Pipeline step types (9 built-in)
Each step is data (stored in MongoDB), not code. An org configures their pipeline once in the dashboard settings, and every approved PR flows through it automatically. Steps have configurable failure modes: stop the pipeline, warn and continue, or silently continue.
We shipped four pre-built templates: VPS SSH (our setup), Vercel, GitHub Actions, and Blog SSH (for its-boris.com). Creating a custom pipeline is drag-and-drop in the settings page.
Per-org multi-repo support
The original system was hardcoded to Curate-Me-ai/platform. That's fine when you're the only user. But we have three repos now:
Configured repositories
| Org | Repo | Board | Deploy Target |
|---|---|---|---|
| Curate-Me | Curate-Me-ai/platform | Project #1 (5 columns) | VPS SSH + health check |
| Its Boris Blog | its-boris/blog | Project #1 (3 columns) | SSH to /opt/its-boris-blog |
| Margin Mandy | its-boris/marginmandy | Project #2 (3 columns) | SSH to /opt/marginmandy |
Each org stores its GitHub config in MongoDB: owner, repo, project board ID, status field IDs, and board column option IDs. The board issues endpoint dynamically switches between organization and user GraphQL queries based on an is_user flag — because GitHub's API uses different query roots for org-owned vs user-owned projects.
This was one of those changes where the diff looks small but the blast radius is huge. Every endpoint that touches GitHub issues — from-issue, mark-done, deploy, board/issues — had to be updated to thread the org config through.
The pluggable engine architecture
Not every task needs Opus. Not every review needs a container.
Worker and reviewer engines
| Engine | Type | Best For | Cost |
|---|---|---|---|
| Claude Code | Container | Complex multi-file changes | $0 (Max sub) |
| Codex | Container | Large codebase refactors | $0 (ChatGPT Plus) |
| Aider | Container | Quick fixes, pair programming | Varies |
| Gemini CLI | Container | Google ecosystem work | Varies |
| Copilot CLI | Container | GitHub-native workflows | Varies |
| LLM (Haiku) | SDK call | Fast decomposition and review | $0 (Max sub) |
| Codex Review | Container | GPT-5.4 code review | $0 (ChatGPT Plus) |
| Claude Code Review | Container | Opus-level code review | $0 (Max sub) |
Each engine implements WorkerEngineProtocol: install_script(), invoke_command(), auth_env_vars(). Adding a new engine is one class and one registry entry. The reviewer uses a similar factory pattern — get_reviewer(engine, model) returns the right implementation.
Per-org defaults are configurable from the dashboard settings page. Per-task overrides are supported for when you want Opus on a complex task but Sonnet on a quick fix.
What's next
The pipeline works end-to-end for the Curate-Me platform. We just hooked up the its-boris.com blog and Margin Mandy. The container image needs work — the OpenClaw gateway startup blocks the Claude Code CLI, which means the first few container tasks fail until we fix the entrypoint.
But the architecture is right. The pipeline is hookable. The engines are pluggable. The deploy steps are configurable. And the whole thing costs $0 on a Max subscription.
The uncomfortable question: if an AI can go from a one-sentence GitHub issue to a deployed production fix in three minutes for free, what does that mean for how we think about task prioritization? We used to batch small fixes because the context-switching cost was too high. Now the cost is zero. Every "we'll get to it later" item on the backlog is a candidate for autonomous resolution.
We're not there yet — the success rate on complex tasks is still low, and the container startup needs to be more reliable. But for the simple fixes that make up 60% of any backlog? The pipeline is ready.
Everything described here is running in production at dashboard.curate-me.ai. The deploy pipeline, per-org config, and engine selection are all available in the dev-team settings page. If you're interested in hooking up your own repo, the setup takes about 10 minutes — create an org, configure the GitHub board IDs, and pick a deploy template.
Rate this post
Comments
Loading comments...