Time series storage at 50M samples per second
Time series storage at 50M samples/sec: multi-tenant architecture, shuffle sharding, and load control in a high load observability system
Infrastructure on ThecoreGrid covers the design, operation, and evolution of the foundational systems that power modern software at scale.
We explore compute, networking, and storage layers, along with virtualization, containers, and cloud platforms in highload environments. The focus is on production-grade engineering: reliability, fault tolerance, capacity planning, cost efficiency, and secure system design. Topics include Infrastructure as Code, automation, provisioning, multi-region setups, traffic routing, and failure recovery. We analyze real-world trade-offs and operational challenges, supported by BigTech practices, incident post-mortems, and lessons from large-scale infrastructure failures. You’ll find deep dives into observability, performance tuning, and platform reliability under dynamic workloads. Instead of basic setup guides, the Infrastructure tag delivers practical insights for platform engineers, DevOps teams, SREs, and architects responsible for building and maintaining robust, scalable, and efficient infrastructure systems.
Time series storage at 50M samples/sec: multi-tenant architecture, shuffle sharding, and load control in a high load observability system
Seastar output stream now supports mixed writes. An analysis of invariant-based testing and AI debugging in complex state transitions
AI agent memory as an architectural layer. How persistent memory eliminates stateless limitations and impacts system scalability
Cross-site replication PXC in Kubernetes: how to set up DR via Percona Operator and avoid degradation due to latency and flow control
Confidential Containers in Kubernetes: how data in use protection works through attestation and TEE without trusting the cluster and administrators.
Containerized PLCs on Linux provide determinism and low latency even under load. An analysis of architecture and trade-offs
How AI code review in CI/CD reduces latency and noise through the orchestration of LLM agents and strict filtering of results
How to build K3s on-prem Kubernetes via k0rdent and Proxmox. Declarative approach, BYOT, and cluster management without manual assembly
AI-driven self-healing networks in telecom: How Telstra automates incident management and reduces recovery time from hours to minutes in cloud infrastructure
AI Infrastructure, GPU Compilers, Agentic Systems, Distributed Systems, High Performance Computing, HPC, Telecommunications, SRE
Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.