× Install ThecoreGrid App
Tap below and select "Add to Home Screen" for full-screen experience.
B2B Engineering Insights & Architectural Teardowns

LLM Multi-Agent System Holos and Agentic Web Architecture

The LLM multi-agent system becomes the foundation of the Agentic Web. Holos offers an architecture where agents scale and coordinate as an ecosystem.

The classical hypothesis that AGI will emerge through the scaling of a single model is called into question here. The reason lies in inductive limitations: any model is constrained by its data and optimization. This leads to local maxima—strong in narrow tasks but unstable in an open environment. Holos considers an alternative: intelligence as a result of the interaction of many agents rather than a single center.

Architecturally, Holos implements the LLM multi-agent system at the web level. At its core is a five-layer model, where each layer isolates a class of problems. The Substrate Layer with the Nuwa engine is responsible for generating and “lazy” activation of agents through a serverless approach. The Coordination Layer manages a DAG of tasks, separating planning from the assignment of executors (blind planning), and then uses market dispatch with Learning-to-Rank to select an agent. The Value Layer closes the loop through economic incentives, linking the quality of execution with the agent’s future opportunities.

A key insight is the attempt to solve three systemic problems of LaMAS: scaling friction, coordination degradation, and value loss. For scaling, a “dormant agents” model is used: agents are stored as profiles and activated only when needed through JIT instantiation. For coordination, a DAG with topological validation (Kahn’s algorithm) is applied, eliminating cycles and logical planning errors. To prevent toolset homogenization, S-MMR is used—a balance of relevance and diversity in the toolset, which reduces the correlation of agent behavior.

Separately, the market dispatch mechanism stands out. Instead of a static registry, a hybrid approach is used: active search through embedding + passive bids via pub/sub channels. The final choice is made by an LTR model (LambdaMART), which takes into account semantics, cost, reputation, and constraints. This turns task distribution into an economic optimization problem rather than just routing. In long tasks, the system shifts from a synchronous model to an event-driven loop: state is serialized, execution resumes based on events, which reduces resource consumption and increases resilience.

A practical takeaway for architects: Holos demonstrates a shift from “pipeline agents” to “agent economy.” This is important for systems with high uncertainty and long task lifetimes. The separation of planning and execution reduces the cognitive load on LLM and decreases the likelihood of errors. The serverless model of agents makes it possible to scale to millions of entities without linear infrastructure growth. However, the trade-off is evident: the system becomes more complex to debug, and quality depends on ranking mechanisms and incentives, not just the model.

Information source

arXiv is the largest open preprint repository (since 1991, under the auspices of Cornell), where researchers quickly post working versions of papers; the materials are publicly accessible but do not undergo full peer review, so results should be considered preliminary and, where possible, checked against updated versions or peer‑reviewed journals. arxiv.org

View the original research PDF

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.