Low latency systems are constrained not by the CPU, but by communications. We analyze how architecture reduces latency without sacrificing reliability.

The issue doesn’t appear immediately — it only becomes noticeable once the system reaches the limits of network interaction. In modern distributed systems, an increase in throughput no longer means a decrease in latency. On the contrary, additional layers of abstraction in the cloud and the complexity of processors increase delays. The majority of time is spent not on computations, but on message transmission between components. Attempting to “solve the problem with hardware” — more CPUs, more threads — rarely yields results, as the bottleneck remains in communication.

In low latency systems, not only absolute latency is critical, but also predictability. In systems such as trading, the average latency is less important than the specific operation. Additionally, latency affects recovery: while the system is recovering, it is effectively unavailable. This makes architectural decisions around message exchange crucial. If the system is built as a chain of services with multiple hops, degradation becomes inevitable.

A practical response to this is separation of concerns and control over communication channels. The approach implemented in LMAX Disruptor shows that the main gain comes not from “new algorithms,” but from decomposing workflows. Logging, decoding, and business logic are separated. Each thread performs one task and does not block. This reduces resource contention and eliminates unnecessary waits. As a result, latency decreases without changing business logic.

Tools like Aeron develop this idea at the inter-process communication level. They provide control over how messages are transmitted: UDP, multicast, or IPC (inter-process communication). The architecture allows for different levels of trade-offs. IPC reduces latency through shared memory but requires components to be placed on the same host. This is a typical trade-off between distribution and speed.

Further latency reduction is associated with minimizing layers of abstraction. Using kernel bypass via DPDK allows bypassing the OS network stack and writing packets directly to the NIC. This removes overhead and provides microsecond latencies. However, the cost is increased complexity and reduced portability. This approach is justified only where latency is a key business metric.

Interestingly, the limit of optimization is the abandonment of messages altogether. IPC still requires serialization and deserialization. If components are within the same process, one can switch to function calls. In this case, latency is measured in nanoseconds, and with inlining, it practically disappears. But this radically limits architecture and scalability.

Here arises a key architectural question: can a system be built where the communication channel is chosen dynamically? In some cases — a network, in others — shared memory, in others — direct calls. This approach requires strict control over data flows and often relies on models like replicated state machines and consensus (e.g., Raft). They allow for maintaining consistency while aggressively optimizing communications.

The final conclusion is pragmatic. Low latency systems are not about a single tool. They are about managing boundaries: between threads, processes, and machines. The fewer these boundaries and the more transparent they are, the lower the latency. But each optimization increases system coupling and reduces flexibility. The balance between these factors defines the maturity of the architecture.

Read

Low latency systems are constrained not by the CPU, but by communications. We analyze how architecture reduces latency without sacrificing reliability.

🚀 Deploy the Blocks