A selection of architectural insights and releases we read this week.
LLM Infrastructure & Distributed AI
🔹 TENT (Slice Spraying Engine)
A declarative engine for data distribution in disaggregated LLM serving reduces tail latency through adaptive slice placement and resilience to network degradations.
Read the release
🔹 DWDP (Distributed Weight Data Parallelism)
A new parallelism mode for inference on NVL72 scales LLMs without classic tensor/pipeline parallelism bottlenecks by redistributing weights instead of activations.
Read the release
🔹 CRAFT (Cost-aware Expert Allocation)
Optimizing MoE expert placement considering layer-level cost reduces inference expenses while maintaining SLAs—an important step toward production MoE economics.
Read the release
🔹 UNIFERENCE
A discrete-event modeling framework for distributed AI allows for reproducible testing of architectural decisions before deployment, including network and scheduling effects.
Read the release
🔹 Spark-LLM-Eval
A distributed LLM evaluation system focusing on the statistical significance of results eliminates typical benchmarking errors on large clusters.
Read the release
GPU Clusters & High-Performance Systems
🔹 Node-Interconnect Multi-Path Balancing
Execution-time planning of network paths eliminates skew in GPU clusters, improving utilization without application-layer changes.
Read the release
🔹 EXaCTz
Lossy compression with guaranteed preservation of topological properties (graph/contour trees) enables aggressive compression of scientific data without losing analytical correctness.
Read the release
Networking & Distributed Systems Theory
🔹 Internet-scale Anomaly Detection (Topology & Routing)
Methods for detecting routing anomalies and congestion at the internet scale show how to combine telemetry and inference for real-time network diagnostics.
Read the release
🔹 Density-Delay Law
Formalizing the relationship between event density and delays provides a foundation for the predictable design of event-driven distributed systems.
Read the release
🔹 Online Network Slice Deployment (Multi-domain)
Algorithms for placing network slices considering trust constraints allow managing a multi-operator infrastructure without centralized control. Read the release
Cloud Native & Telco
🔹 Serverless5GC
The architecture of a 5G core as a set of function procedures demonstrates how a serverless approach can be applied to a telecom core, yielding benefits in flexibility and operational costs.
Read the release
Identity & Application Architecture
🔹 Source Known Identifiers
A three-tier identification model (source-aware) solves the problem of trust and traceability in distributed applications without centralized identity providers.
Read the release
Agentic Systems & Applied AI
🔹 PayPal Agentic Toolkit + MCP Servers
Infrastructure for agent-driven commerce shows how MCP and tool APIs turn payment systems into a programmable environment for autonomous agents.
Read the release