The Week We Stopped Building Features

The hangover

Three days ago, we shipped the marathon session: 4 sites, 50+ features, 150+ agent runs, one weekend. It felt like the most productive session we'd ever had. Parallel worktree agents running 6 features at a time. Terminal recordings. Chat widgets. Blog redesigns. Everything was shipping.

Then we tried to make it work in production.

Commits since

3 days

Bug fixes

41% of all commits

4,700+

New tests

across 6 waves

New features

12% of all commits

Forty-one percent of the work since that marathon was fixing things the marathon broke. This is the post about those three days — the unglamorous part nobody writes about, the part where the platform actually became real.

Day 1: The bugs were everywhere

March 25 started as a feature day. We shipped multi-repository support for the dev-team pipeline, a Quick Connect flow for GitHub repos, and a board integration that fetches issues directly from connected repos. Six feature commits before lunch.

Then the bug reports started.

40 CORS failures

Fourteen dashboard API clients were making gateway requests to localhost:8002 in production. The root cause was simple: NEXT_PUBLIC_GATEWAY_URL had never been set as an environment variable on the VPS. Every file that used it fell back to the development default.

The fix was also simple — use /gw-api relative paths that get rewritten by Next.js to the gateway. But the fix had to be applied across 14 separate files in two batches, because we kept finding more.

The CORS cascade

10amFirst CORS bug found in secrets-vault API client

11am14 more CORS bugs found across gateway API files

2pmFix batch 1: switch 14 files to relative /gw-api paths

4pm12 MORE CORS bugs found in remaining dashboard clients

5pmFix batch 2: all dashboard API clients now use relative URLs

6pm3 gateway-side bugs: rate limit, sliding-window count, port exhaustion

Forty CORS bugs. All with the same root cause. All invisible in development because localhost:8002 works fine when the gateway is running locally.

The auth key that never existed

Twelve dashboard files were reading authentication tokens from a localStorage key called cm_gateway_key. That key had never been set by any part of the application. The canonical key was dashboard_access_token, set by the auth flow.

This means twelve pages — traces, fleet timeline, cost attribution, runner status — had been silently failing to authenticate since they were written. They worked in development because the dev server doesn't enforce auth the same way. In production, they returned empty data or 401 errors.

The fix was a find-and-replace across 12 files. The lesson was harder to absorb.

Null safety: the `.toFixed()` epidemic

This one took eight separate commits to resolve. The dashboard had hundreds of places where numerical values from API responses were passed directly to .toFixed(2). When the API returned null or undefined — which it does for any resource that hasn't accumulated cost data yet — JavaScript threw Cannot read properties of null (reading 'toFixed').

We found these in costs pages, runner detail pages, gateway logs, session recording pages, prompt editor components, and fleet timelines. Every page that displayed a cost or a latency number was potentially broken.

Day 2: The pivot to testing

By March 26, the pattern was clear. We were not going to find these bugs by using the dashboard manually. We needed systematic coverage.

20%

Coverage before

backend

43%

Coverage after

backend

17,560

Passing tests

+10,539

506

Test files

new

Six waves of test generation, each targeting a different layer of the stack:

Test waves — what got covered

Wave 1: Runner CP + enterprise960runner control plane, agents, DB, routes

Wave 2: Dashboard1,000validation, auth, API hooks, exports

Wave 3: Billing + gateway800billing, config, workers, models

Wave 4: Models + routes800routes, agents, DB, webhooks

Wave 5: Services + middleware1,000integrations, utils, checkpoint

Wave 6: Gateway modules140coverage script, runner CP

But here is the part nobody tells you about AI-generated tests: 11 of the generated test files were broken. Not failing — broken. Import errors that prevented the entire test suite from collecting. Tests that referenced functions that didn't exist. Tests for modules that had been rewritten.

We deleted them across three separate cleanup commits. The broken tests were generated by agents running against stale snapshots of the codebase, and no one had run them before merging. That is the same lesson as the marathon — speed without verification compounds debt.

The testing push also forced a refactor. The test configuration was a mess: fixtures duplicated across files, thresholds misaligned, no-op fixtures that did nothing. We extracted everything into reusable modules and standardized the infrastructure. This was the boring, correct work that makes the next 4,700 tests possible.

Day 2.5: The architecture reckoning

Between test waves, we broke up the monoliths.

lib/api.ts was 3,236 lines — a single file containing every API call the dashboard makes. It was the third-largest file in the entire monorepo. We split it into 20+ specialized modules: agents-api.ts, billing-api.ts, costs-api.ts, gateway-admin.ts, webhooks-api.ts, and so on.

The template builder modal was 1,494 lines. We extracted it into 7 step components (BasicsStep, InputsStep, WorkspaceStep). The trace viewer was 1,083 lines. The agent chat panel was 831 lines. Each one became a set of focused modules.

Refactored files

in one PR

8,849

Lines added

modular components

7,738

Lines removed

monoliths broken up

3,236

Largest file before

lib/api.ts

Net change: +1,111 lines. Breaking up monoliths always adds a few lines for the module boundaries and re-exports. That is fine. What matters is that each module can now be tested, reviewed, and reasoned about independently.

Day 3: The `?? 0` saga

On March 27, after the test waves landed, we tried to build the dashboard for production. Next.js 15 has gotten stricter about TypeScript. It flagged 519 instances of a pattern like this:

// value is already guaranteed to be a number by the type system
const display = (value ?? 0).toFixed(2);

The ?? 0 is a nullish coalescing operator — a fallback to zero if the value is null or undefined. But the type system says the value is already a number. Next.js now treats this as unreachable code: a type error.

So we bulk-removed all 519 instances with a single sed command.

The build passed. We deployed. And then 13 pages broke.

The ?? 0 incident

MorningNext.js flags 519 unreachable ?? 0 patterns as type errors

11amBulk removal: sed removes all 519 instances

11:30amBuild passes, deployed to production

12pm13 pages broken — values that CAN be null despite types

12:15pmReverted bulk removal entirely

1pmSurgical removal: 506 safe instances removed, 13 kept

The problem: some of those values came through optional chaining — stats?.costs?.today ?? 0. TypeScript sees the final type as number | undefined, but the intermediate ?? 0 is what protects against the undefined case. Removing it means .toFixed() crashes when stats is null.

519 instances looked identical. 506 were genuinely safe to remove. 13 were load-bearing. You cannot tell which is which without reading each one.

The lesson is specific: bulk automated refactors that look purely mechanical are not purely mechanical. Types lie. Optional chains create invisible null paths. The only safe approach was surgical: read each instance, trace the value's origin, decide.

What we actually built

Not nothing. Nine feature commits survived between the fixes:

Machine Registry: BYOVM runners now have ownership and sharing semantics. Personal machines, shared machines, pool machines. A dashboard page for managing them, an install wizard, and a machine picker in the runner creation modal.
Cloud Runner + VPS monitoring: Capacity monitoring for multi-machine deployments.
Command palette: Replaced the broken header search with a real cmdk-powered Cmd+K palette. 40 searchable pages, 6 quick actions, contextual suggestions.
Landing page redesign: Collapsed 13 sections to 7, removed 3,700 lines, new headline.
Security hardening: 45 findings across 4 severity waves.

But the ratio tells the story. 9 features, 30 fixes, 16 test commits. For every feature we shipped, we fixed three things and wrote tests for two more.

What I learned

The marathon was real. 50+ features shipped. But they shipped into a codebase that was not ready to receive them. The CORS bugs existed before the marathon — we just never noticed because we were moving too fast to test in production. The auth key mismatch was written months ago. The null safety issues had been accumulating since the dashboard's first draft.

Speed creates debt. AI-assisted development creates debt faster, because the agents can generate code faster than you can verify it works. The marathon proved we could ship 50 features in a weekend. The week after proved that shipping is not the same as working.

I don't regret the marathon. Those features needed to exist. But if I were doing it again, I would have stopped at feature 30 and spent the second day writing tests instead of feature 50.

Twenty percent backend coverage is not a testing strategy. It is an absence of one.

The platform now has 17,560 passing backend tests at 43% coverage, 1,000+ dashboard tests, and a modular architecture that can actually be maintained. None of that is visible to users. All of it is why the next marathon will not require a week of cleanup afterward.

The Week We Stopped Building Features

The hangover

Day 1: The bugs were everywhere

40 CORS failures

The CORS cascade

The auth key that never existed

Null safety: the `.toFixed()` epidemic

Day 2: The pivot to testing

Test waves — what got covered

Day 2.5: The architecture reckoning

Day 3: The `?? 0` saga

The ?? 0 incident

What we actually built

What I learned

Comments

Leave a comment

The hangover

Day 1: The bugs were everywhere

40 CORS failures

The CORS cascade

The auth key that never existed

Null safety: the .toFixed() epidemic

Day 2: The pivot to testing

Test waves — what got covered

Day 2.5: The architecture reckoning

Day 3: The ?? 0 saga

The ?? 0 incident

What we actually built

What I learned

Comments

Leave a comment

Null safety: the `.toFixed()` epidemic

Day 3: The `?? 0` saga