Virtual Tensors Eliminate Data Movement in DNN
Data movement optimization through virtual tensors: how VTC reduces latency and eliminates unnecessary operations in DNN compilation.
Infrastructure on ThecoreGrid covers the design, operation, and evolution of the foundational systems that power modern software at scale.
We explore compute, networking, and storage layers, along with virtualization, containers, and cloud platforms in highload environments. The focus is on production-grade engineering: reliability, fault tolerance, capacity planning, cost efficiency, and secure system design. Topics include Infrastructure as Code, automation, provisioning, multi-region setups, traffic routing, and failure recovery. We analyze real-world trade-offs and operational challenges, supported by BigTech practices, incident post-mortems, and lessons from large-scale infrastructure failures. You’ll find deep dives into observability, performance tuning, and platform reliability under dynamic workloads. Instead of basic setup guides, the Infrastructure tag delivers practical insights for platform engineers, DevOps teams, SREs, and architects responsible for building and maintaining robust, scalable, and efficient infrastructure systems.
Data movement optimization through virtual tensors: how VTC reduces latency and eliminates unnecessary operations in DNN compilation.
FSM benchmark network configuration: how NetAgentBench reveals failures of LLM agents in dynamic network scenarios and multi-turn behavior.
Hive federation in a data warehouse: how to move from a monolith to a distributed architecture without downtime or loss of data consistency.
Edge-cloud multi-agent architecture with decentralized management: how to reduce latency, traffic, and enhance resilience in mobile automation. –>
English:
How to design low-latency systems: controlling communication, Disruptor, Aeron, and the trade-offs between speed and architecture.
CPU-free LLM inference: how to remove the CPU from the critical path and stabilize latency in LLM serving architectures.
KV cache optimization in multi-LoRA serving: how ForkKV reduces memory consumption and increases throughput of LLM inference.
Root cause analysis (RCA) hinges on scale and the human factor. Meta’s approach with DrP demonstrates how to turn debugging into a reproducible engineering process. The problem does not manifest immediately — until the system reaches organizational scale. Incidents begin to recur, but each time they are investigated anew. Knowledge of where to look for … Read more
Platform Program split became a key step for Uber when the growth of the team began to hinder development. This decision changed both the architecture and the organization simultaneously. The problem manifested not at the code level, but at the level of team interaction. When Uber’s engineering organization grew to about 100 people, the division … Read more
P2P model distribution in Kubernetes with Dragonfly: how to reduce traffic to the origin and accelerate the delivery of large models from Hugging Face and ModelScope.
Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.