serversim
SimulatorMy DocumentsTemplatesPricing
…
…
BLOG
⚠

Why "100 more servers" did not get faster — 6 scale-out pitfalls

2026-05-14·10 min·
scale-outpitfallsopsproduction

Problems horizontal scale alone cannot fix: single DB bottleneck, cold cache spike, microservice chain latency, sticky sessions, connection pools, cascading failures. Each demonstrated in the simulator.

“Traffic grew. Just add more servers, right?” — the most common claim on the whiteboard. In production, six pitfalls regularly make horizontal scale not work.

1. Single-DB bottleneck

App instances multiply, but all writes still hit one primary DB. Read replicas help reads; writes serialize. Solution: read/write split, sharding, write-through cache for idempotent operations, eventual move to NewSQL (CockroachDB / Spanner-class).

2. Cold cache spike

Add an app instance — its in-process cache starts empty. Every request misses for the first few minutes, hammering the DB. Solution: warm caches on startup, distributed cache (Redis) shared across instances, request coalescing on misses.

3. Microservice chain latency

One user request fans out to N microservices. p99 of the chain = max of all p99s + RTT per hop. Each new service adds ~5-10ms minimum. Solution: critical-path budget, parallel fan-out where possible, batched aggregation, careful service decomposition.

4. Sticky sessions

In-memory session state pins a user to one instance — that instance becomes a hot spot, scaling becomes uneven, deploys risk dropping sessions. Solution: sticky session (HAProxy cookie / source IP), external session store (Redis), or stateless JWT (short TTL + refresh token). Modern default: JWT.

5. Connection pool exhaustion

Each app instance opens its own DB pool (say 50 conns). 100 instances → 5000 connections, exhausting the DB’s max_connections. Solution: connection pooler middleware (PgBouncer / proxysql / managed DB proxy) — multiplex app’s 5000 conns into ~100 actual DB conns.

6. Cascading failure

One downstream slows down. Upstream callers retry. Each retry holds a thread. Soon every app instance has all threads stuck on the slow downstream — even healthy endpoints stop responding. Solution: Circuit Breaker, bulkhead, request timeout, retry with jitter, load shedding.

Summary

Horizontal scale solves stateless CPU/IO. Six pitfalls require dedicated architectural work. The simulator’s “scale-out single DB” preset demonstrates pitfall #1 directly — add app instances and watch DB util saturate while throughput plateaus.

🧪 Try it in the simulator

The scenarios in this post are runnable in the simulator. Turn the knobs and watch the result change.

Open simulator →
← Back to blog
© 2025-2026 serversim · Architecture simulation tool