Server-side sharded list and watch in Kubernetes changes the behavior of controllers. This is an attempt to eliminate the system ceiling when working with high-cardinality resources.

When Kubernetes clusters grow to tens of thousands of nodes, controllers hit scalability limits not where one would typically expect. The problem arises at the list/watch interaction level with the API server. Each instance of a horizontally scalable controller receives the full stream of events for resources like Pods. It then spends CPU, memory, and network resources on deserialization, discarding the majority of objects. Horizontal scaling does not reduce the processing cost per replica. It simply multiplies the overall expense. This is a classic case where the system scales in terms of the number of instances but not in terms of efficiency.

The solution in Kubernetes v1.36 is server-side sharded list and watch. This is an alpha feature that moves filtering to the API server level. Instead of each controller filtering the stream locally, the server sends only relevant events. Each replica of the controller receives its segment of data, defined through shardSelector. This reduces unnecessary traffic and eliminates duplicate work. The trade-off here is clear: the logic for interacting with the API becomes more complex, and there is a dependency on a new feature that is still in alpha stage.

At the implementation level, a shardSelector field is added to ListOptions. The client specifies a hash range through shardRange(). The API server computes a deterministic 64-bit FNV-1a hash based on the selected field and returns only those objects that fall within the range [start, end). This works for both list responses and watch streams. Importantly, the hash function is deterministic across all instances of the API server, ensuring consistent behavior in a distributed configuration.

Specific fields are supported: object.metadata.uid and object.metadata.namespace. This limitation affects the sharding strategy. Controllers using informers can inject shardSelector through WithTweakListOptions. In a simple case, for example with two replicas, the hash space is divided in half. If necessary, multiple ranges can be specified using a logical OR to cover non-contiguous segments.

There is an important detail: the API server explicitly signals whether a shard has been applied. A shardInfo field appears in the response. If it is absent, it means the server ignored the shardSelector, and the client received the full dataset. In such a scenario, the controller must be prepared to revert to client-side filtering. This is critical for backward compatibility and stability.

From a results perspective, the main effect is a reduction in network and CPU load due to the decreased volume of data passing through each controller. Specific metrics in the original data are not provided, but the architectural gain is evident: duplication of work and load on the API server are reduced. However, the maturity of the solution remains a question, as the feature is in alpha and requires enabling the feature gate ShardedListAndWatch.

In the industry, this approach has long been discussed as a more accurate model for scaling watchers. Kubernetes is taking a pragmatic step in this direction by moving filtering closer to the data source. This does not eliminate all limitations but removes one of the most costly bottlenecks.

Read

Server-side sharded list and watch in Kubernetes changes the behavior of controllers. This is an attempt to eliminate the system ceiling when working with high-cardinality resources.

🚀 Deploy the Blocks