× Install ThecoreGrid App
Tap below and select "Add to Home Screen" for full-screen experience.
B2B Engineering Insights & Architectural Teardowns

Redis proxy for highload caching and failure control

Redis proxy becomes a key layer for cache management as load and complexity increase. Let’s explore how an architectural proxy eliminates degradation and stabilizes highload systems.

The problem does not manifest immediately — until the moment Redis stops being a “transparent” component and starts dictating system behavior. In the described case, degradation began with an increase in the number of connections and hitting I/O limits. Sharp spikes in client activity created a thundering herd effect: multiple services simultaneously opened connections, overloading the cluster. At the same time, fragmentation of client libraries increased. This broke observability and complicated diagnostics. As a result, incidents affected the availability of the entire system, not just the cache layer.

Attempts at local optimizations, such as client-side pooling, provided a temporary effect. They isolated Redis failures but did not eliminate the root cause. Architecturally, the problem remained: too much logic and responsibility resided on the client side. This created strong coupling and complicated the evolution of the system.

The solution is to move control to a separate layer: Redis proxy. This approach has been previously applied to databases through proxies for SQL routing. Here, the same idea is transferred to caching. The proxy becomes the point where:

  • connection management
  • command routing
  • client behavior control
  • observability unification

This is a compromise solution. It adds an additional network hop and complicates the infrastructure. But in return, it provides control over the system under highload conditions, where client behavior becomes unpredictable.

The implementation is built as a stateless service between applications and Redis clusters. The architecture is divided into two layers:

  • frontend: accepts connections, parses Redis commands (RESP)
  • backend: multiplexes connections and executes commands

This separation reduces coupling and allows components to evolve independently. A key detail is the semantic processing of commands. Unlike typical solutions, the proxy understands the structure of requests and can apply runtime rules: filtering, routing, and custom commands.

Configuration is another important choice. Instead of static files, a Starlark program is used, which is executed at runtime and generates the configuration. This allows changing system behavior without deployment. In the context of SRE, this reduces MTTR, as changes can be made faster and more safely.

The issue of Redis Cluster with CROSSSLOT errors has been separately addressed. Typically, the client must account for sharding. Here, the proxy intercepts such scenarios and performs scatter-gather: it breaks the pipeline into parts, executes them in parallel, and aggregates the result. For the client, this appears as a single operation. This reduces the requirements on client logic and simplifies migrations.

The migration to the proxy was made reversible. Services were switched via configuration, without changing code. Traffic was gradually transitioned, with the ability for instant rollback through feature flags. A phased rollout was used for large services. This is a typical pattern for reducing risk when changing the infrastructure layer.

Before launch, the system underwent regular stress tests with loads exceeding peak values. This is important: the proxy becomes a critical component, and its failure impacts the entire platform.

Results show that the main gain is not in latency or throughput per se, but in manageability:

  • the thundering herd effect has been eliminated
  • operations (failover, scaling, updates) have become zero-downtime
  • observability has been unified: metrics, logs, and traces are aligned
  • incident diagnosis time has decreased from hours to minutes

Quantitative metrics of improvements, aside from the claimed high availability, are not detailed. But the architectural effect is clear: the system transitions from distributed chaos of clients to centralized control.

An interesting side effect is the abstraction over the backend. The proxy operates above RESP and can switch between different storage systems, including alternatives to Redis. This reduces vendor lock-in and provides flexibility at the infrastructure level.

In a broader context, this confirms the trend: upon reaching a certain scale, infrastructure ceases to be a “detail” and becomes a product. In such systems, the choice between cache-aside or write-through is less critical than the reliability and manageability of the data layer itself.

In this case, Redis proxy is not just an optimization, but a way to regain control over a system where load and client diversity have already exceeded simple solutions.

Read more – InfoQ

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.