Introduction

When a system is overloaded, the worst thing it can do is pretend it can handle everything.

This causes slow failures, timeouts, frustrated users, retries, traffic surge, and larger outages.

By the end, you’ll understand that once a system is saturated, accepting more work results in increased waiting, leading to timeouts, retries, and outages.

Load shedding is the concept of preventing “overloaded” from becoming “totally unusable”.

What is load shedding

Load shedding is intentionally rejecting some work during overload so the system can keep serving the work that matters most.

It sounds harsh because it is. It chooses to drop requests deliberately instead of letting the system drown in queues and timeouts.

I compare it to a restaurant that stops seating new tables when the kitchen is backed up. Turning people away feels bad, but waiting two hours and burning out staff is worse.

Why load shedding exists

Every system has a capacity limit. Near that, small traffic increases can cause significant latency jumps as work queues up.

When I say a system is saturated, I mean it has no slack left. All the workers (threads, processes, database connections, whatever actually does the work) are busy.

Accepting more work than you can finish causes two problems:

  • Users wait longer and often time out.
  • The system wastes effort on work it will not complete.

Load shedding helps keep the system under control.

A simple way to think about overload is that latency is mostly service time plus waiting time. Under light load, waiting is near zero. Near saturation, waiting grows fast because requests pile up in queues. That is why small increases in traffic can lead to significant increases in latency.

Overload happens when service and waiting times combine. Under light load, waiting is minimal. Near saturation, waiting increases sharply with request queues, so small traffic increases significantly raise latency.

A common symptom is worse tail latency. The slowest 1 percent of requests get much slower before the averages look scary.

Load shedding cuts wait times by reducing initial work entry.

Backpressure signals to slow down; load shedding occurs when slowing isn’t enough or a rigid boundary is needed.

In practice, systems use both: backpressure to reduce incoming rate, and load shedding to cap work when the system is already saturated.

If you want the broader definition and framing, see What Is Backpressure?.

Common misconceptions

A few misunderstandings show up a lot:

  • Load shedding is the same as rate limiting. Rate limiting helps shed load, focusing on priority and protection. You might drop costly work first, even if request rates seem normal.
  • Load shedding is the same as autoscaling. Autoscaling increases capacity but has delays and limits. Load shedding maintains system usability while scaling catches up or is ineffective.
  • Load shedding is a failure. It is a controlled failure. The alternative during overload is often uncontrolled failure, timeouts, and retries that make the outage bigger.

What load shedding looks like in practice

Load shedding shows up as controlled rejection, not mysterious timeouts.

Examples:

  • An API returns HTTP 429 (Too Many Requests) instead of accepting work it cannot finish.
  • A service sheds low-priority traffic first, for example, expensive recommendation calls, while preserving checkout.
  • A queue stops accepting new work when it reaches a hard limit, so producers fail fast.
  • A circuit breaker trips and rejects calls to a failing dependency, so the system stops making the dependency worse.

The details vary, but the principle stays the same: fail fast and preserve capacity.

What you give up when you shed load

Load shedding does not remove errors.

It chooses visible errors now instead of slow failures and a wider outage later.

That trade is often the most humane option for users. A fast “try again later” is less damaging than a spinning page that never loads.

How load shedding prevents retry storms

Retry storms occur when failures trigger retries, which amplify load. Load shedding helps because it reduces wasted work and keeps the system from entering the timeout zone, where retries explode.

If clients receive clear overload signals, such as an HTTP 429 with Retry-After, they can back off instead of hammering harder.

For the adjacent failure mode, see What Is a Retry Storm?.

Where to go next

If you remember one thing from me, it is this: load shedding is a choice to protect bounded latency for some requests by making overload visible for others.

If you want the ideas that connect tightly to load shedding:

References