Skip to content

Engineering Strategy & Vision

How to write strategy docs that guide decisions and vision docs that align teams long-term.


TL;DR

  • Strategy tackles problems. Diagnosis + policies + actions. Timeframe: 3-12 months
  • Vision paints the future. Where you're headed in 2-3 years. One per area, max
  • Write 5 design docs first. Then pull patterns out. That's your strategy
  • Write 5 strategies first. Then forecast 2 years ahead. That's your vision
  • Bottom-up beats top-down. Synthesize what teams already wrote in RFCs, not mandate from above
  • Test = useful. If teams don't reference it when making decisions, rewrite or delete it

Strategy vs Vision

StrategyVision
PurposeSolve a specific problemAlign teams long-term
TonePractical, decisiveAspirational, directional
Timeframe3-12 months2-3 years
DetailAccurate, specificIllustrative, broad strokes
QuantityMany (one per problem)Few (one per area, max)
TradeoffsExplicit (we choose X over Y)Implicit (we move toward X)
Use case"How do we scale the API?""What does engineering look like in 2026?"

When to write strategy: You have a specific challenge. Database is slow. Deploys break production. No one knows how to onboard.

When to write vision: Teams making independent decisions need alignment. Backend, frontend, and mobile solving auth differently. Three teams building separate notification systems.


How to Write a Strategy

Structure

A strategy has three parts (Richard Rumelt framework):

  1. Diagnosis — What's the problem? What constraints define it?
  2. Guiding Policies — What tradeoffs will we make? What rules apply across teams?
  3. Coherent Actions — What specific steps implement the policies?

Example: API Performance Strategy

Diagnosis

API response time degraded from 200ms (Jan 2025) to 3.2s (Feb 2026). 40% of requests timeout. Root cause: database queries grew from 12 per request to 87 (N+1 queries). No caching. No query budget enforcement.

Guiding Policies

  1. Query budget per endpoint: Max 10 database queries. Enforced via middleware that errors at 11
  2. Cache-first for reads: All GET endpoints check Redis before Postgres
  3. Async for writes: POST/PUT return immediately, process via job queue

Coherent Actions

  1. Add query counter middleware (Alice, by Feb 25)
  2. Implement Redis caching for /users, /orders, /products (Bob, by Mar 3)
  3. Refactor checkout to use job queue (Charlie, by Mar 10)
  4. Set alert: error if any endpoint >500ms p95 (Alice, by Feb 27)

Target: API response time <500ms p95 by Mar 15. Measure daily, review Mar 20.

Three Questions Your Policies Must Answer

  1. Resource allocation: Where do we spend engineering time? (60% features, 30% tech debt, 10% ops)
  2. Fundamental rules: What standards apply to all teams? (All endpoints <500ms, 80% test coverage, zero secrets in code)
  3. Decision-making: How do teams make choices? (RFC for >5 day projects, tech lead approval for new dependencies, weekly arch review)

Good vs Bad Policies

Bad (vague)Good (specific)
"Improve performance""All endpoints <500ms p95. Add budget alert at 400ms"
"Write more tests""80% line coverage. PR blocked below 75%"
"Better code quality""Linter enforced. Max cyclomatic complexity 10. No functions >50 lines"
"Prioritize tech debt""30% of sprint capacity = tech debt. Tracked per team in weekly report"

How to Write It

Bottom-up (recommended):

  1. Collect 5 recent RFCs or design docs your team wrote
  2. Pull out common patterns: "We keep choosing Redis over in-memory cache. Why?"
  3. Write the diagnosis based on recurring problems
  4. Extract policies from decisions teams already made
  5. List actions to align everyone on those policies

Top-down (only if you have no RFCs yet):

  1. Interview 5 engineers: "What's slowing you down?"
  2. Find the common thread (e.g., "deploys break production weekly")
  3. Write diagnosis
  4. Workshop policies with 3-5 senior engineers
  5. Share draft org-wide, 3-day feedback window
  6. Finalize and commit to 2-month review

Common Mistakes

MistakeFix
No tradeoffs statedMake it explicit: "We choose speed over cost" or "We choose reliability over features"
Too many policies (>5)Pick the 3 that matter most. Delete the rest
Actions without owners or datesEvery action: name + deadline. No "team will handle"
Strategy without a problemIf there's no diagnosis, you don't need a strategy
Writing it aloneWorkshop diagnosis with 3-5 people. Get dissenter feedback before publishing
No review dateSet 2-month check-in. If strategy didn't change decisions, delete it

How to Write a Vision

Structure

A vision has seven parts (Will Larson framework):

  1. Vision statement — 1-2 sentences. Aspirational. Repeat it everywhere
  2. Value proposition — How this benefits users and the company
  3. Capabilities — What the team/product must deliver to achieve the vision
  4. Solved constraints — Problems that go away in the future state
  5. Future constraints — New problems you'll face
  6. Reference materials — Appendix with supporting docs, metrics, research
  7. Narrative — 1-page story tying it all together

Example: Engineering Org Vision (2026-2028)

Vision Statement

By 2028, any engineer ships production code in their first week. Deploys are invisible. Incidents self-heal.

Value Proposition

  • For engineers: Onboarding takes 2 days, not 2 months. Deploy 10x/day without fear
  • For the company: Ship features 3x faster. Downtime <0.1% annually. Hire faster (no 6-month ramp)

Capabilities

  1. Self-service infrastructure: Engineers provision databases, queues, caches via UI. No tickets
  2. Automated rollbacks: Canary deploy catches errors, rolls back in <60 seconds
  3. Codified onboarding: Click "New Engineer" → laptop configured, access granted, first PR ready

Solved Constraints (2026 problems that go away)

  • Manual deploys (currently 2 hours, break 30% of the time)
  • Ticket-based provisioning (currently 3-day wait for a database)
  • Tribal knowledge onboarding (currently 6 weeks before first production commit)

Future Constraints (new problems)

  • Infrastructure cost up 40% (self-service = engineers over-provision)
  • Monitoring alert fatigue (auto-rollback creates 10x more alerts)
  • Security review backlog (engineers ship faster than security can audit)

Reference Materials

Narrative

Today, a new engineer waits 6 weeks to ship their first production code. They submit 14 tickets: laptop access, VPN, database credentials, deploy permissions. Each ticket takes 2-3 days. When they finally deploy, it's manual — copy commands from a wiki, hope nothing breaks, roll back by hand if it does. 30% of deploys break something.

By 2028, that same engineer clicks "New Hire Onboarding" on day one. Laptop arrives pre-configured. They clone the repo, run make dev, and see the app locally in 5 minutes. They push a small fix — a typo in the UI. CI runs tests automatically. Deploy goes to 5% of users (canary). Metrics stay green. Canary expands to 100% in 10 minutes. The engineer shipped production code in their first week.

This happens because we built three things: (1) Self-service infrastructure — engineers click "New Postgres DB" and get one in 30 seconds. No tickets. (2) Automated rollbacks — canary deploy watches error rate, latency, key metrics. Spike detected = instant rollback. (3) Codified onboarding — every setup step is code, not a wiki. Click a button, everything provisions.

The tradeoff: infrastructure cost goes up 40%. Engineers over-provision because it's easy. We're okay with that — engineer time is more expensive than servers. We add cost dashboards so teams see their spend and optimize later.

By 2028, shipping code is invisible. Engineers focus on features, not deploys.

Good vs Bad Vision Statements

Bad (vague)Good (specific)
"We'll be the best engineering org""By 2028, any engineer ships production code in their first week. Deploys are invisible. Incidents self-heal."
"Improve developer experience""Onboarding takes 2 days, not 2 months. Deploy 10x/day without fear"
"Innovate and scale""Self-service infrastructure. Automated rollbacks. Codified onboarding."

How to Write It

  1. Write 5 strategies first. You can't forecast without solving current problems
  2. Project 2 years ahead: What do those strategies enable? "If we solve API performance this quarter, what's possible in 2028?"
  3. Interview 10 engineers: "What's your ideal workflow in 2 years?"
  4. Draft vision statement. Test with 5 people. If they can't repeat it back, simplify
  5. Write capabilities. Each one should be measurable (not "better DX" but "onboarding in 2 days")
  6. List solved and future constraints. Be honest about tradeoffs (cost, complexity, etc.)
  7. Write narrative last. Tell the story of before/after. Make it concrete
  8. Share org-wide. 1-week feedback window. Finalize. Revisit annually

Common Mistakes

MistakeFix
Too many visions (one per team)One vision per area. Engineering = 1 vision, not 5
Buzzword soup ("leverage synergies")Write like you talk. If you wouldn't say it in a meeting, cut it
No tradeoffs statedHonest visions admit costs: "This will increase infra spend 40%"
Past tense ("We built X")Present tense. Write as if 2028 is now: "Engineers deploy 10x/day"
Vision without buy-inWorkshop with 10+ people before publishing. Dissenters should feel heard
Never revisitedReview annually. Scrap it if teams don't reference it

Strategy Template

Use this when you have a specific problem to solve. Copy template below.

File: strategy-[problem-name]-[year-quarter].md

Example: strategy-api-performance-2026-q1.md

See template →


Vision Template

Use this when teams need long-term alignment. Max one per area.

File: vision-[area]-[end-year].md

Example: vision-engineering-2028.md

See template →


Decision Framework: Do You Need Strategy, Vision, or Neither?

SituationWrite...
Database queries are slowStrategy (diagnosis: N+1 queries; policies: query budget; actions: add caching)
Teams building duplicate systemsVision (paint the future: shared platform, self-service, consolidated tools)
One-time decision (which DB to use)RFC / Design Doc (not strategy — it's a single choice, not a recurring pattern)
Everything is fineNothing (don't write strategy/vision unless there's a clear problem or alignment gap)
New team formingVision (align on what you're building toward)
Quarterly planningStrategy if you have recurring problems; nothing if problems are one-offs

Rule of thumb:

  • 5+ teams making similar decisions? → Vision (align them long-term)
  • Recurring problem every quarter? → Strategy (set policies to prevent it)
  • One-time decision? → Design doc / RFC (not strategy)
  • No problem? → Nothing (don't write docs for the sake of docs)

How to Know if It's Working

For Strategy

SignalWhat it means
Teams reference it in RFCs✅ Working — strategy guides decisions
No one mentions it after 2 weeks❌ Not working — rewrite or delete
Policies get violated without pushback❌ Not enforced — add accountability or scrap the policy
Teams ask "Does this align with strategy?"✅ Working — it's a decision-making tool

Review strategy after 2 months. If it didn't change behavior, delete it.

For Vision

SignalWhat it means
Teams cite it when explaining roadmap✅ Working — vision aligns decisions
New hires can repeat the vision statement✅ Working — it's memorable
No one talks about it after launch❌ Not working — too vague or irrelevant
Teams build features that contradict vision❌ Not working — vision doesn't guide tradeoffs

Review vision annually. If teams aren't referencing it, rewrite or retire it.


Progression Framework: Who Writes What

LevelStrategyVision
Junior - MidContributes to diagnosis (interviews, data). Reviews draftsGives feedback on vision drafts
SeniorWrites strategy for team-level problems. Leads RFC synthesisReviews vision. Suggests capabilities
StaffWrites strategy for org-level problems. Workshops with 3+ teamsCo-writes vision with Principal. Drives buy-in
PrincipalOwns 1-year technical roadmap. Publishes quarterly strategiesWrites vision. Represents it to execs and board

See Competency Matrix for behavior expectations:

  • Senior: "Thinks 2-3 quarters ahead. Proposes where to invest and where to cut" (Strategy)
  • Staff: "Owns 1-year technical roadmap. Ships features while reducing tech debt quarter over quarter" (Strategy)
  • Principal: "Defines company-wide technical strategy. 2-3 year horizon" (Vision)

Common Mistakes Across Both

MistakeFix
Writing aloneWorkshop with 5-10 people. Get dissenter feedback early
No examples or metricsEvery claim needs a number. "Slow" → "3.2s response time"
Defending the status quoStrategies/visions should change something. If not, don't write it
No owner or deadlineEvery action: name + date. No "team will handle"
Publishing without feedback windowShare draft, collect input for 3-7 days, finalize
Never reviewing impactSet 2-month (strategy) or 1-year (vision) review. Delete if not useful

References


Sources: