CI/CD: 15 min → under 5 min

The problem

Four engineers, one merge queue, 15-minute deploys. Math: even at perfect coordination, the team could ship max ~32 deploys per work day. In practice it was closer to 6, because the pipeline was so slow people batched changes — which then made bisecting failures harder.

Symptoms anyone shipping in 2026 will recognise:

Cold pnpm install on every job (~3 minutes)
All tests ran serially in a single Node process (~5 minutes)
Deploy to Vercel after every merge to main, even doc-only changes
No CODEOWNERS, no branch protection beyond "1 approval"
PR descriptions were a free-form mess

What I changed (in order of impact)

1. Per-package install cache

Switching to pnpm with an actions/cache step keyed on the pnpm-lock hash brought cold installs from 3 minutes to ~25 seconds for cache hits. This alone got us most of the way to the under-5-minute target.

2. Parallel test shards

I split the test suite into 4 shards using GitHub Actions matrix strategy and Jest's built-in --shard flag. Each shard runs in parallel on its own runner. Wall-clock for tests: 5 minutes → 1 minute 20 seconds.

Heads up

Sharding only saves time if your test suite has minimal cross-shard flakiness. We had two flaky tests that depended on shared filesystem state — those had to be fixed before sharding actually helped.

3. Conditional builds via Turborepo

Migrated the monorepo to Turborepo with task-level caching. Now if you change a doc file in the marketing package, the app build is cached and skipped entirely. Deploys for non-app changes finish in under 90 seconds.

4. Deploy-on-tag, not deploy-on-merge

Every merge to main no longer triggers production deploy. Instead, a releaseworkflow on a tagged commit kicks off the deploy pipeline. This decouples "merging is safe" from "going live" — the team merges fearlessly, releases happen daily by explicit decision.

5. Review velocity, not just CI

CI speed matters less than human review speed. I added:

CODEOWNERS— auto-assigns the right reviewer per path; no more "hey can someone review this?" in Slack
PR template with required sections: What, Why, How tested, and a screenshot for UI changes
Auto-merge on green for trivial PRs (deps bumps, typos, doc changes) gated by labels
Branch hygiene bot — stale branches deleted after 14 days; closed PRs cleaned up nightly

Half the "CI is slow" complaint was "reviews are slow" in disguise. Fix the human loop and you don't need another gigabyte of cache.

The result

Deploy time: 15 minutes → 4:30 average (best case under 90 seconds for cached builds)
Review cycle: PR-open to PR-merged dropped from ~7 hours to ~3.5 hours (−50%)
Throughput: ~4 PRs/day team-wide → ~8 PRs/day sustained
Confidence: zero cross-team Slack threads asking "is the pipeline ok?" in the 6 weeks after rollout

What I'd do differently

I'd set up the deployment dashboard first. We rebuilt the pipeline blind for the first two weeks — no shared visibility into per-job duration. Once I added a Vercel Analytics + GitHub Actions durations widget to a shared Notion doc, the rest of the team started flagging slow steps proactively. Should have done that in week one.

I'd also enable the auto-merge label sooner. The first month was too cautious — we manually merged trivial PRs out of habit, eating review time on changes that didn't need a human eye.