The Race Condition Hiding in the Flush

The bug looked too polite to be dangerous. Requests returned 200 OK. The Images service believed it had sent the response. Logs did not scream. Traces did not point at a crash. The only visible symptom was that large transformed images sometimes arrived with the bottom missing, as if the network had quietly taken a pair of scissors to the body.

Cloudflare's postmortem on a bug in hyper, the Rust HTTP library used by its Images service, is worth reading because it is not really a Cloudflare story. It is a systems story about what happens when a local optimization changes timing just enough to expose an old assumption hiding below the application layer.

The setup was practical. Cloudflare's Images binding lets Workers call image-processing functionality directly. At the end of 2025, Cloudflare reworked that path so the Workers runtime could reach the Images service locally on the same machine, instead of routing the request through FL, Cloudflare's internal front-line intermediary. That made the path faster and gave the Images team more release control.

The lesson is not that faster paths are unsafe. The lesson is that faster paths can change who applies backpressure, and backpressure is where hidden state-machine bugs go to become real.

The failure mode was honest and wrong

The failures were narrow enough to be maddening. They appeared intermittently, mostly for larger image responses, and only on the production path. A customer had nested image processing: one Worker composited large images through the binding, then the result went through a second optimization path. The inner pipeline sometimes returned a successful response whose Content-Length promised megabytes, while the body contained only a fraction of that.

That distinction matters. The system did not lose a request. It lost trust in the end of a stream. If a response says it is 3.3 MB and only a few hundred kilobytes arrive before EOF, the failure is not the application deciding to send less. It is the transport layer ending the conversation before the bytes have actually crossed the boundary.

encoded response
  -> hyper internal buffer
  -> socket outbound buffer
  -> reader on the other side

if the reader slows down:
  buffer fills
  flush returns pending
  shutdown must wait

The important state is not only whether the response has been encoded. It is whether the bytes have actually drained.

Strace found the place where the truth changed

Cloudflare worked inward through the system. Reproductions removed customer-specific details. Timeout theories failed. Testing newer hyper versions did not help. Local tests could not trigger the bug, even under load. Distributed traces narrowed the problem to the Images service return path, but still left the crucial question unanswered: did the process actually write the full response to the socket?

The useful tool was strace, because it records what the process asks the kernel to do. In successful requests, Cloudflare saw repeated writes as the socket accepted chunks, followed by shutdown after the data had drained. In failing requests, the pattern was brutally short: one write, then shutdown. In one example from the post, about 219 KB left the process out of a roughly 14.9 MB response.

That is the kind of evidence application telemetry often cannot produce. The service can log that it handed the response to an HTTP library. The library can believe the body is complete because encoding is done. The tracing system can show a happy logical request. None of that proves the kernel received every byte.

The bug was a discarded pending state

The core bug lived in hyper's HTTP/1 connection lifecycle. Hyper had buffered the response, tried to flush it to the socket, and then moved toward shutdown. Under ordinary conditions, the socket accepted everything quickly, so the distinction between buffered and flushed was invisible.

Under the new local reader, large responses sometimes filled the socket buffer. The flush operation returned Poll::Pending, meaning the runtime needed to wait until the socket could accept more data. But the loop discarded that result with a pattern equivalent to:

let _ = self.poll_flush(cx)?;

A pending flush is not a completed flush. Throwing that signal away made the connection look finished while bytes were still waiting.

Once the request side no longer wanted to read more, hyper could proceed to shutdown even though its write buffer still held response data. The client then received an EOF and a partial image. Nothing had to crash. No timeout had to fire. The state machine simply mistook encoded for delivered.

The upstream fix landed in hyper PR #4018. The important change is conceptually small: before the HTTP/1 transport shuts down, it must flush pending buffered data. In simplified terms, shutdown waits on the flush instead of cutting off the socket while bytes remain in user-space memory.

Why this belongs in a runbook

This bug is a neat technical puzzle, but the operational lesson is bigger than Rust, Cloudflare, or image transformations. Many production systems have layers that report success before the next layer has made that success durable. Queues accept jobs before workers finish them. Filesystems acknowledge writes before remote replicas settle. HTTP libraries buffer bodies before sockets drain. Control planes accept intent before the data plane catches up.

The danger is not buffering. The danger is letting a status word cross a boundary without carrying the condition that made it true.

Application traces tell you what the program believed happened.
Kernel traces can tell you what the process actually asked the operating system to do.
Backpressure tests prove whether success still holds when the next layer slows down.
Deterministic reproducers turn timing ghosts into reviewable code.

Cloudflare's most useful detail is that observing the bug too broadly made it disappear. Adding enough tracing overhead changed the timing between flush and shutdown. That is a familiar shape in concurrency work: the act of looking can become part of the schedule. When a failure vanishes under instrumentation, it does not mean the failure was imaginary. It may mean the instrumentation accidentally fixed the race for one run.

The takeaway

The web is full of polite acknowledgments that are only true inside one layer. 200 OK does not mean the whole body arrived. A completed encoder does not mean the socket drained. A buffered response does not mean the reader has it.

Cloudflare found this one because the team kept moving downward until the evidence reached the syscall layer. That is the discipline worth carrying forward. When the bug is invisible to your usual dashboards, do not only add more dashboards. Ask which layer is allowed to lie by omission.

Sometimes the missing bytes are not in the logs because the logs are standing above the place where they disappeared.