Linux MD RAID5 patch series targets stripe-cache contention

RAID5 is old enough that it is easy to mistake it for a solved problem. The idea is familiar: spread data and parity across drives so a single disk can fail without taking the array down. The performance warnings are familiar too. Small writes are expensive. Rebuilds are stressful. Write holes need real handling. If you have spent time around storage, none of that sounds new.

But old subsystems keep finding new bottlenecks when the hardware around them changes. Phoronix spotted a new Linux MD RAID5 patch series on LKML that reports up to roughly 10-17% improvement in some configurations. The interesting part is not the percentage by itself. The interesting part is where the percentage comes from: reducing contention around per-stripe and stripe-cache handling on systems with many CPU cores and many disks.

When storage gets wide enough, the hard part is not always moving bytes. Sometimes it is keeping every worker from standing in the same accounting line.

The stripe cache becomes shared infrastructure

RAID5 operates in stripes. A stripe is the unit where data blocks and parity relate to each other across the array. The kernel has to track which stripes are active, which are dirty, which are being read, which need parity updates, and which can be released back into the cache. That bookkeeping is not decorative. It is how the system keeps parallel I/O coherent.

On a modest array, the cost of that coordination can hide under the disk work. On a larger machine, with more drives and more CPU workers, the coordination itself becomes visible. More worker threads can help only while they are not fighting over the same shared state. Past that point, the storage stack is no longer disk-bound in the simple sense. It is concurrency-bound.

application write
  -> block layer
  -> md raid5 stripe lookup
  -> parity and data update
  -> stripe-cache state change
  -> drive I/O

wide array + many workers:
  shared stripe state can become the hot path

The drives are still doing the physical work, but the stripe cache decides how much parallelism reaches them.

The patch series is about scaling the boring path

The patch set described by Phoronix was posted by Hiroshi Nishida and contains eight changes for Linux's MD RAID5 code. The stated target is not a new feature visible to users. It is lower contention in the internal path that manages stripes and stripe-cache work.

That is exactly the kind of kernel work that tends to matter more than it advertises. A storage stack can have excellent throughput on paper and still leave performance behind if threads spend too much time coordinating access to shared structures. The fixes that move production systems forward are often not spectacular rewrites. They are narrower changes that make hot locks colder, reduce unnecessary shared-state pressure, and let existing worker threads do useful work more often.

The benchmark shape is also worth reading carefully. At the default single handling-thread setting, Phoronix reports the series as neutral. As worker threads are added, the gains grow, with broad improvement around group_thread_cnt = 4. At gtc = 8, write-heavy workloads kept improving while a read-heavy high-concurrency case had already saturated.

That is not a magic-speedup story. It is a scaling story. The patches are most relevant where the machine has enough concurrency for the old contention point to show up.

Neutral at default is a meaningful result

For kernel storage changes, the absence of a default regression matters. MD RAID arrays are not a toy environment. They sit under filesystems, databases, backup targets, homelabs, media servers, and plenty of infrastructure that nobody wants to discover through data loss or surprise latency spikes.

A patch series that improves many-core, many-disk behavior while staying neutral at the default configuration has a cleaner review argument than one that simply shifts cost from one workload to another. It says the old path does not get worse for ordinary deployments, while wider arrays may get more of the parallelism they already paid for in hardware.

Operational note: this is still code review material, not a production recommendation. The useful takeaway today is the shape of the bottleneck, not an instruction to run unmerged storage patches on important arrays.

Why this matters outside RAID5

The broader lesson applies to almost every mature infrastructure component. Old code paths often assume that the expensive part of the system is obvious. For storage, that usually means the disks. For networking, it means the wire. For databases, it means the query. For CI, it means the build. Then hardware gets faster or wider, and a coordination point that used to be cheap starts setting the ceiling.

More cores expose shared locks and counters that were harmless on smaller machines.
More disks turn metadata coordination into part of the data path.
More worker threads help only until they collide on common state.
More benchmark coverage shows which workloads scale and which ones saturate.

That is why patches like this are worth tracking even if you do not run MD RAID5. They show a recurring systems pattern: once the obvious bottleneck moves, the quiet bookkeeping becomes performance engineering.

The takeaway

The Linux MD RAID5 work is not glamorous. That is the point. It is a reminder that infrastructure performance is often won in the places operators rarely see directly: cache ownership, worker scheduling, lock scope, per-object state, and the shape of the queue that feeds the hardware.

If the series survives review and lands upstream, some wide RAID5 systems may get a useful bump without changing applications, filesystems, or disks. Even if the final numbers change, the diagnosis is durable. Storage systems do not scale just because the machine has more cores and more drives. The shared structures between those cores and drives have to scale too.

RAID5 is old. Contention budgets are not.

RAID5 Still Has a Contention Budget