B2B Engineering Insights & Architectural Teardowns

Live Origin at Netflix: Segment Quality Control and Write Isolation Under Load

In live streaming, an error is not a degradation but an instant user-facing incident. Netflix addresses this by moving quality control and prioritization directly into the origin layer.

The main limitation arises where VOD approaches stop working. In live, there is no time buffer: a segment must be encoded, delivered, and cached within seconds. Any write delay or segment defect is immediately visible to the viewer. Additionally, the system faces resource contention: segment writing, CDN reading, and traffic spikes (especially at the live edge) all happen simultaneously. At the scale of tens of millions of users, even brief storage failures or extra origin requests lead to cascading degradation.

Netflix chose an architecture where Live Origin is not just a storage endpoint, but an active decision-making layer. The key choice is duplicating live pipelines across different regions and selecting the “first valid” segment. This reduces the likelihood of delivering defective video without complex synchronization between pipelines. The second important trade-off is abandoning dynamic manifests in favor of segment templates with fixed durations. This makes system behavior predictable and allows the origin to compute when a segment “should” appear, but requires strict discipline in encoding and timing.

The implementation is built around a simple but tightly controlled contract: the packager writes segments via HTTP PUT, and the CDN reads them via HTTP GET at the same URLs. Inside this simple interface lies the logic. The packager adds metadata about defects, and the origin, upon request, selects the best segment from multiple pipelines. If a segment is not yet ready, the origin does not always respond with 404—at the live edge, it may hold the connection until publication, reducing unnecessary traffic. The CDN (Open Connect) further optimizes behavior: it even caches 404s with precise TTL and filters out impossible requests based on the segment template. For millisecond-level accuracy, Netflix extended standard HTTP caching with custom headers.

Storage adds a separate layer of complexity. S3 proved insufficiently predictable for a 2-second write SLA. Netflix switched to an abstraction on top of Cassandra with chunking of large objects and local quorum to withstand availability zone failures and maintain high write availability. This improved latency but revealed a conflict: during peak reads (Origin Storm), writes degraded. The solution was a write-through cache (EVCache), which absorbs almost all read traffic, preserving write stability. In parallel, full path isolation was implemented: separate compute stacks, distinct read/write clusters, and independent scaling circuits.

As a result, the system manages not only data but also priorities. Segment writing is always critical, followed by live edge, then DVR. Under overload, the origin applies prioritized rate limiting and even deliberately returns 503 with TTL to “calm down” repeated requests. This combination—predictable templates, redundant pipelines, write isolation, and aggressive caching—enables the system to handle tens of millions of concurrent streams without degrading user experience. Metrics show significant reduction in storage latency and resilience under extreme read throughput, though the cost of the solution is higher—an intentional choice in favor of reliability.

Read the original

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.