The problem
Four engineers, one merge queue, 15-minute deploys. Math: even at perfect coordination, the team could ship max ~32 deploys per work day. In practice it was closer to 6, because the pipeline was so slow people batched changes — which then made bisecting failures harder.
Symptoms anyone shipping in 2026 will recognise:
- Cold pnpm install on every job (~3 minutes)
- All tests ran serially in a single Node process (~5 minutes)
- Deploy to Vercel after every merge to main, even doc-only changes
- No CODEOWNERS, no branch protection beyond "1 approval"
- PR descriptions were a free-form mess
What I changed (in order of impact)
1. Per-package install cache
Switching to pnpm with an actions/cache step keyed on the pnpm-lock hash brought cold installs from 3 minutes to ~25 seconds for cache hits. This alone got us most of the way to the under-5-minute target.
2. Parallel test shards
I split the test suite into 4 shards using GitHub Actions matrix strategy and Jest's built-in --shard flag. Each shard runs in parallel on its own runner. Wall-clock for tests: 5 minutes → 1 minute 20 seconds.
3. Conditional builds via Turborepo
Migrated the monorepo to Turborepo with task-level caching. Now if you change a doc file in the marketing package, the app build is cached and skipped entirely. Deploys for non-app changes finish in under 90 seconds.
4. Deploy-on-tag, not deploy-on-merge
Every merge to main no longer triggers production deploy. Instead, a releaseworkflow on a tagged commit kicks off the deploy pipeline. This decouples "merging is safe" from "going live" — the team merges fearlessly, releases happen daily by explicit decision.
5. Review velocity, not just CI
CI speed matters less than human review speed. I added:
- CODEOWNERS— auto-assigns the right reviewer per path; no more "hey can someone review this?" in Slack
- PR template with required sections: What, Why, How tested, and a screenshot for UI changes
- Auto-merge on green for trivial PRs (deps bumps, typos, doc changes) gated by labels
- Branch hygiene bot — stale branches deleted after 14 days; closed PRs cleaned up nightly
Half the "CI is slow" complaint was "reviews are slow" in disguise. Fix the human loop and you don't need another gigabyte of cache.
The result
- Deploy time: 15 minutes → 4:30 average (best case under 90 seconds for cached builds)
- Review cycle: PR-open to PR-merged dropped from ~7 hours to ~3.5 hours (−50%)
- Throughput: ~4 PRs/day team-wide → ~8 PRs/day sustained
- Confidence: zero cross-team Slack threads asking "is the pipeline ok?" in the 6 weeks after rollout
What I'd do differently
I'd set up the deployment dashboard first. We rebuilt the pipeline blind for the first two weeks — no shared visibility into per-job duration. Once I added a Vercel Analytics + GitHub Actions durations widget to a shared Notion doc, the rest of the team started flagging slow steps proactively. Should have done that in week one.
I'd also enable the auto-merge label sooner. The first month was too cautious — we manually merged trivial PRs out of habit, eating review time on changes that didn't need a human eye.