Managing the context window in multi-agent systems determines the quality of reasoning and the robustness of investigations. We explore how this is addressed through context separation.

The problem only emerges when the agentic system scales beyond short scenarios. In long investigations, the number of inference requests increases, along with the volume of message history. Since LLM APIs remain stateless, the entire history must be transmitted with each call. This quickly runs into context window limitations: latency increases, costs rise, and the quality of responses degrades. In a multi-agent configuration, the situation becomes more complex: each agent requires its own slice of the system’s state. Too little context leads to a loss of coherence, while too much results in decreased reasoning quality and confirmation bias.

The solution revolves around separating context into three channels: Journal, Review, and Timeline. This is a compromise between completeness and manageability. Instead of transmitting the full message history, the system sends aggregated representations of the state. The Journal acts as the working memory of the director — recording hypotheses, decisions, and questions. The Review aggregates expert conclusions and verifies their accuracy. The Timeline collects a unified narrative of events. This approach reduces pressure on the context window while keeping agents aligned with the overall logic of the investigation.

Implementation relies on a strict decomposition of roles. The Director manages the process and maintains the Journal, which is accessible to all agents as a timeline. This creates a single source of truth without transmitting raw message history. Experts work in narrow domains and return findings linked to artifacts (tool calls). However, this is insufficient due to the risk of hallucination. Therefore, a Critic is introduced, which checks conclusions through access to raw data and methodologies. It analyzes not only the results but also the correctness of their derivation. Additionally, scoring is employed: conclusions are ranked by reliability, allowing for the filtering of weak hypotheses.

A key element is the Timeline. Unlike the Review, it does not require access to tools and works only with already aggregated context. This reduces the load on the model and enhances reasoning quality. The Timeline forcibly establishes causal relationships and discards inconsistent conclusions. In effect, this is a mechanism for combating hallucination through narrative consistency. If a conclusion does not fit into the overall chain of events, it is not retained. Additionally, the system limits the number of gaps to avoid overwhelming the director with secondary uncertainties.

The result is a robust system behavior in long scenarios without transmitting the full message history. Context is managed through compressed representations rather than data accumulation. This reduces latency and costs, although exact metrics are not specified. More importantly, coherence between agents and quality control of reasoning are maintained. This approach can be seen as an evolutionary improvement of agent frameworks, where state management becomes an architectural task rather than a side effect.

Such solutions reflect a general trend: a move away from “infinite context” in favor of structured memory. Even with hypothetically unlimited context windows, transmitting the entire history remains contentious — an excess of information hinders adaptation to new data. Here, a pragmatic path is chosen: limit context while increasing its density and relevance.

Read

Managing the context window in multi-agent systems determines the quality of reasoning and the robustness of investigations. We explore how this is addressed through context separation.

🚀 Deploy the Blocks