A selection of architectural insights and releases we read this week.

AI Systems & LLM Infrastructure

🔹 Blink (CPU-free LLM Inference)
A radical shift in serving architecture: fully offloading the inference pipeline to the GPU and SmartNIC eliminates the CPU as a bottleneck, reducing latency and increasing throughput under heavy loads.
Read the release

🔹 ForkKV (Disaggregated KV Cache for Multi-LoRA)
A copy-on-write KV cache allows scaling multi-LoRA inference without a linear increase in memory, making multi-tenant LLM serving significantly more efficient.
Read the release

🔹 Holos Multi-Agent System
LLM agents are moving to web-scale orchestration: an architecture is proposed where agents coordinate as a distributed system with explicit roles, memory, and communication.
Read the release

🔹 Anthropic Three-Agent Harness
A practical implementation of multi-agent development: separating roles (planner, coder, verifier) increases the resilience of long-running tasks and reduces quality degradation.
Read the release

Distributed Systems & HPC

🔹 Reverse Address Translation in Multi-GPU Pods
It is shown that RAT can become a hidden bottleneck in scale-up GPU configurations, affecting inter-GPU communication latency—critical for LLM training.
Read the release

🔹 Alltoallv RMA in MPI
An analysis of persistent RMA implementations shows how to reduce the overhead of collective operations—key to optimizing communication-heavy HPC workloads.
Read the release

🔹 Minos (GPU Workload Profiling)
A framework systematically links the performance and power profiles of GPU tasks, paving the way for energy-aware scheduling in HPC clusters.
Read the release

Cloud Native & Platform Engineering

🔹 OpenFaaS Runtime & Kubernetes Study
It is demonstrated that the choice of runtime and Kubernetes distribution significantly affects the cold start and throughput of serverless functions—tuning requires a systemic approach rather than relying on defaults.
Read the release

🔹 Pinterest Spark Auto Memory Retries
An engineering pattern: automatic retries with memory adaptation reduce OOM errors by 96%, turning unstable batch pipelines into predictable ones.
Read the release

🔹 Autonomous AI SRE Agent (Elasticsearch)
The end-to-end SRE cycle (deploy → monitor → heal) is automated via an agent, demonstrating a real transition to self-healing systems without human intervention. Read the release

Data Infrastructure & Databases

🔹 Etsy Migration to Vitess
The migration of 1000 MySQL shards (425 TB) to Vitess confirms Vitess’s maturity as a control plane for massive sharding and online zero-downtime migrations.
Read the release

Developer Experience & Performance Engineering

🔹 GitHub Diff Performance Engineering
A deep dive into optimizing diff rendering shows that bottlenecks often lie in comparison algorithms and layout rather than I/O—an important lesson for UI engineering on large data sets.
Read the release

Security & Blockchain

🔹 Routing Attacks in Ethereum PoS
Attacks at the network layer (routing/eclipse) remain a real threat to PoS: consensus is secure only with robust network connectivity.
Read the release