What Is Backpressure?

Introduction

Most distributed systems do not fail because one request is expensive. They fail because too many requests arrive at once, the system cannot keep up, and work piles up until everything times out.

Backpressure is the concept that stops that pileup from becoming a full outage.

What backpressure is

Backpressure is a signal, or a mechanism, that tells an upstream component to slow down because a downstream component is at or near capacity.

In plain terms: it is how a system says, “Stop sending me so much work, I’m choking.”

Why backpressure matters

Without backpressure, upstream systems keep pushing work into a saturated downstream. That creates queues, long waits, and tail latency spikes.

Once timeouts start, retries often show up, and that can turn overload into a feedback loop. If you want the adjacent failure mode, see What Is a Retry Storm?.

A simple mental model: the input rate must match the service rate

Every component has a maximum rate it can handle.

If work arrives faster than it can be processed, the component has only a few options:

Queue the work and make users wait.
Reject some work.
Shed load intentionally.

Backpressure is what makes “reject” and “shed” coordinated instead of chaotic.

Respect backpressure

When a downstream says “slow down”, you need a mechanism to reduce speed. Without it, you keep pushing work into a saturated system and retry when it fails.

Backpressure includes rate limits, queue limits, load shedding, and signals like HTTP 429 with Retry-After.

What backpressure looks like in real systems

Backpressure shows up in different layers, but the shape is consistent: a downstream communicates capacity and the upstream adapts.

Common examples:

An HTTP API returns HTTP 429 (too many requests) and tells clients when to retry using Retry-After.
A queue refuses new messages when it reaches a configured limit.
A worker pool is full, so a service rejects new requests instead of accepting and timing out later.
A database connection pool saturates, and callers block or fail fast.

The trade-off

Backpressure feels harsh because it says “no” or “not right now”.

But the alternative is usually worse:

Everybody waits, and users experience slow failures.
Timeouts trigger retries, and traffic multiplies.
The system spends effort doing work it cannot finish.

Backpressure turns uncontrolled waiting into controlled rejection.

Where to go next

If you want adjacent concepts that connect tightly to backpressure:

Read What Is a Retry Storm?, for how retries amplify overload.
Read What Is a Thundering Herd?, for synchronized bursts that overwhelm a bottleneck.
Read Fundamentals of Software Performance, for saturation, queueing, and tail latency.

References

HTTP 429 Too Many Requests, for the semantics of rate limiting responses and how clients should interpret them.
The Tail at Scale, for why tail latency dominates large distributed systems when components queue and saturate.

Introduction#

What backpressure is#

Why backpressure matters#

A simple mental model: the input rate must match the service rate#

Respect backpressure#

What backpressure looks like in real systems#

The trade-off#

Where to go next#

References#

Comments #