× Install ThecoreGrid App
Tap below and select "Add to Home Screen" for full-screen experience.
B2B Engineering Insights & Architectural Teardowns

Single-threaded exchange architecture for deterministic trading

Single-threaded architecture combined with Raft ensures determinism and fault tolerance in exchange systems. This is critical when the cost of an error is measured not by latency, but by direct financial losses.

The problem in exchange architecture arises when the system encounters a mismatch between speed and predictability. The requirements are contradictory: sub-millisecond latency, strict execution fairness, and the ability to restore the market state at any point in time. Any nondeterminism breaks reproducibility. Without it, neither auditing nor precise incident analysis is possible. Attempting to parallelize order processing increases throughput but introduces races and makes system behavior dependent on timing.

As a solution, a single-threaded architecture is chosen for the core matching engine. This is a conscious rejection of parallelism in the critical section. All incoming events are processed strictly sequentially. This approach simplifies the model: identical input always yields identical results. For fault tolerance and availability, Raft consensus is added on top of this. It replicates the operation log between nodes and guarantees a consistent state. The trade-off is clear: a limitation on CPU scaling versus strong determinism and simplicity of reasoning about the system.

The key principle is determinism. It allows for the implementation of several important properties:

  • reproduction of production incidents through log replay
  • zero-downtime rolling deployments due to identical behavior of versions
  • precise restoration of the market state at any arbitrary moment

The functional model of the exchange is simplified to basic rules, such as price-time priority. This is a compact formalization of complex behavior. It defines which order is executed first when prices intersect. Importantly, such rules are easily encoded in a sequential model and are difficult in a concurrent one.

Implementation begins not with optimizations, but with building a test harness. The system is treated as a black box: incoming API actions, internal state of the order book, and outgoing events. This approach fixes the expected behavior before optimization. Next, the core matching logic is isolated and minimized. The less code in the critical path, the easier it is to guarantee its determinism.

Challenges arise at the boundary with the external world. An exchange is not only about matching but also about broadcasting state to participants, storing history, and complying with regulatory requirements. For example, it is necessary to be able to restore the market state at the microsecond level over the years. This requires strict persistence of all confirmed events and consistency of replicas. Here, Raft addresses the consistency issue but adds overhead for coordination.

Particular attention is paid to latency distribution. It is important not only to consider the average value but also the tails (P99). Market participants build strategies taking these characteristics into account. Unpredictable delays lead to financial losses. The sequential model helps to “smooth out” the tails, as it removes variability associated with thread competition.

The result of this approach is a system where behavior is predictable and reproducible. This allows:

  • to maintain 24/7 availability through replication
  • to conduct safe deployments without downtime
  • to accurately analyze any incidents

At the same time, the initial data does not contain specific metrics on latency or throughput. However, the architectural choice is clearly aimed at balancing: minimal complexity of the core against scalability through horizontal distribution via consensus.

Such solutions have long been discussed in the financial systems industry. Here, they have been taken to a logical extreme: it is better to sacrifice parallelism than to lose control over the system. In the context of high load, this may seem counterintuitive, but it is determinism that becomes the foundation for reliability.

Read more – InfoQ

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.