What Is a Thundering Herd?

Introduction

Have you ever watched a dependency fall over because your system did “the same correct thing” a thousand times at once?

That pattern is the thundering herd problem. Everything looks fine, then a synchronized wave of clients stampedes a shared bottleneck, knocking it down.

I care because it causes pain for users and developers. Users face slow pages and errors, while developers get paged and spend hours chasing “what changed” when the real issue is normal behavior lining up in time.

What a thundering herd is

A thundering herd occurs when many clients wake up simultaneously to do the same work against a single dependency.

You usually see it around a shared trigger:

A cache entry expires.
A token expires, and everybody refreshes.
A cron job starts on the hour.
A deployment restarts a fleet and reconnects all at once.

The work might be correct per the client, but the timing is the bug.

The classic mechanism: cache expiration becomes a stampede

The most common thundering herd story is a cache.

A hot cache key expires.
Many requests arrive and miss the cache simultaneously.
All of them recompute the same value.
The database (or another dependency) gets hammered.

This is sometimes called a cache stampede.

If recompute is costly or the database is near capacity, the herd pushes it into saturation, causing tail latency to spike and timeouts to occur.

If clients retry on timeouts, it can cause a second-order failure known as a retry storm. See What Is a Retry Storm?.

Why it gets so bad so fast

Thundering herds are about synchronization and shared bottlenecks.

If one expensive recompute is fine, a thousand is not. Most systems lack a 1,000x capacity buffer for synchronized bursts.

This is why the symptoms often look like:

Request rate spikes, even if user traffic is “normal”.
Latency percentiles (p95, p99) jump first, then errors follow.
Database load jumps, connection pools saturate, queues grow.

If you want a deeper model for saturation and tails, see Fundamentals of Software Performance.

What reduces thundering herd risk

I see this as breaking synchronization and avoiding duplicate work.

Add jitter where timing matters to prevent expirations and retries from lining up. AWS explains why jitter is essential in Exponential Backoff and Jitter.
Coalesce duplicate work so one request recomputes, and others wait for the shared result, a process called coalescing in some systems.
Limit concurrency at the bottleneck to prevent one hot path from consuming all workers and connections. Bulkheads and pool limits serve this purpose.

Add parallelism with care

Parallelism can cut latency but also increase load and tail behavior.

If you add concurrency, verify you did not create:

A thundering herd.
Retry storms.
A shared bottleneck that now saturates faster.

Where to go next

For adjacent ideas that clarify herds,

Read What Is a Retry Storm?, for how retries amplify overload.
Read Fundamentals of Software Performance, for saturation, queueing, and tail latency.
Read Site Reliability Engineering for reliability patterns, failure modes, and how overload cascades through systems.

References

Thundering herd problem, for a baseline definition and common causes.
Exponential Backoff and Jitter, for why jitter reduces synchronized bursts.
Site Reliability Engineering, for foundational reliability concepts, overload behavior, and system design patterns.
Fundamentals of Reliability Engineering, for reliability patterns, failure modes, and how overload cascades through systems.

Introduction#

What a thundering herd is#

The classic mechanism: cache expiration becomes a stampede#

Why it gets so bad so fast#

What reduces thundering herd risk#

Add parallelism with care#

Where to go next#

References#

Comments #