Engineering Mental Models
How to think about building systems that scale.
TL;DR
Core Mindsets:
- Separate deploy from release - Ship code != change user experience
- Dual write migrations - Change systems without breaking them
- Make things reversible - Fast forward, instant rollback
- Fail gracefully - Stop, don't cascade
- Optimize for iteration speed - Perfect later, working now
Separate Deploy from Release
What it is: Deploying code ≠ changing user experience.
The insight:
- Deploy = technical process (build, test, roll out code)
- Release = business decision (change user-facing features)
These are different operations. Treating them the same creates unnecessary friction.
How to think about it:
- Ship code to production multiple times per day (deploy)
- Control feature visibility independently (release)
- Product/marketing decides when users see changes, not engineering timeline
Example:
- Bad: "We can't launch the feature until Friday because that's when we deploy"
- Good: "Code shipped Monday. Marketing turns on the feature Friday at 10am for launch."
Why it matters: Speed and safety. Deploy fast, release deliberately.
When to use: Every feature. Wrap in feature flags by default.
Dual Write Migrations
What it is: Change systems without breaking them.
The insight: Don't switch systems in one step. Write to both, switch reads gradually.
How to think about it:
- Old system is source of truth
- Add writes to new system (dual write)
- Verify new system matches old
- Switch reads to new system (now source of truth)
- Stop writing to old system
- Delete old system
Example: Moving from MongoDB to PostgreSQL:
- Week 1: Write both, read Mongo (old is truth)
- Week 2: Write both, read Postgres (new is truth, old is backup)
- Week 3: Write Postgres only, delete Mongo
Why it matters: Zero downtime migrations. Instant rollback if new system fails.
When to use: Database migrations, system rewrites, API changes.
Make Things Reversible
What it is: Fast forward, instant rollback.
The insight: Most changes should be reversible in <1 minute.
How to think about it:
- Ship new code to production (not live yet)
- Turn on for 1% of users
- If it breaks, turn off instantly
- If it works, ramp to 100%
Example:
- Bad: Deploy breaks production, need 2 hours to rollback and redeploy
- Good: Feature breaks production, turn off flag in 30 seconds
Why it matters: Reduces fear of shipping. Encourages experimentation.
When to use: Every deploy. Every feature.
Fail Gracefully
What it is: When something breaks, stop. Don't cascade.
The insight: Failing services should fail fast, not slow down everything.
How to think about it:
- If payment API is down, stop calling it
- Return cached data or error message
- Don't let one slow service make your whole system slow
Example:
- Bad: Payment API times out after 30 seconds. Every checkout request waits 30 seconds before failing.
- Good: After 5 failures, stop calling payment API. Return error immediately.
Why it matters: One failure shouldn't cascade to everything. Protect the rest of the system.
When to use: Calling external APIs, microservices, third-party dependencies.
Optimize for Iteration Speed
What it is: Ship fast, learn fast, fix fast.
The insight: Perfect code that ships in 6 months loses to working code that ships in 1 week.
How to think about it:
- Ship MVP fast
- Measure what breaks
- Fix what matters
- Ignore what doesn't
Example:
- Bad: "Let's build the perfect architecture before launch. 6 months to ship."
- Good: "Ship monolith in 1 week. If we hit 10M users, then split into microservices."
Why it matters: Speed of iteration beats perfection. Learn from real users, not assumptions.
When to use: Always. Default to fast iteration.
Build for Observability
What it is: You can't fix what you can't see.
The insight: Logs, metrics, traces should be built in from day 1, not added later.
How to think about it:
- Every critical path should be measurable
- Know when things break before users tell you
- Understand what's slow and why
Example:
- Bad: "We don't know why checkouts are slow. No metrics."
- Good: "Payment API at 3.2s, database query at 0.8s, rest negligible. Fix payment API."
Why it matters: Can't improve what you don't measure. Can't debug what you can't see.
When to use: From day 1. Not an afterthought.
Design for Failure
What it is: Everything will fail. Plan for it.
The insight: Assume services will be down. Assume networks will be slow. Design for it.
How to think about it:
- What happens if this API is down?
- What happens if this database is slow?
- Can we degrade gracefully instead of breaking completely?
Example:
- Bad: Payment API down = users can't browse products
- Good: Payment API down = users can browse, add to cart, checkout disabled with message
Why it matters: Failures are inevitable. Graceful degradation is better than total outage.
When to use: Designing every system, especially critical paths.
Keep it Simple
What it is: Simple systems are easier to understand, debug, and maintain.
The insight: Complexity kills. Every abstraction has a cost.
How to think about it:
- Start simple (monolith)
- Add complexity only when necessary (microservices when monolith can't scale)
- Ask: "What's the simplest thing that could work?"
Example:
- Bad: "Let's use microservices from day 1 because that's best practice"
- Good: "Ship monolith. Split into microservices at 10M users if needed."
Why it matters: Simple systems ship faster, break less, and are easier to fix.
When to use: Always. Default to simplicity.
Inspired by Charity Majors and engineering leaders.