Grafana observability dashboards: flexible customization
Grafana observability dashboards: how to configure services and perform drill-down analysis without leaving the application, while reducing observability fragmentation
Infrastructure on ThecoreGrid covers the design, operation, and evolution of the foundational systems that power modern software at scale.
We explore compute, networking, and storage layers, along with virtualization, containers, and cloud platforms in highload environments. The focus is on production-grade engineering: reliability, fault tolerance, capacity planning, cost efficiency, and secure system design. Topics include Infrastructure as Code, automation, provisioning, multi-region setups, traffic routing, and failure recovery. We analyze real-world trade-offs and operational challenges, supported by BigTech practices, incident post-mortems, and lessons from large-scale infrastructure failures. You’ll find deep dives into observability, performance tuning, and platform reliability under dynamic workloads. Instead of basic setup guides, the Infrastructure tag delivers practical insights for platform engineers, DevOps teams, SREs, and architects responsible for building and maintaining robust, scalable, and efficient infrastructure systems.
Grafana observability dashboards: how to configure services and perform drill-down analysis without leaving the application, while reducing observability fragmentation
How optimizing split learning through SFC reduces latency in distributed AI by jointly managing placement and routing
pgBackRest remains a key tool for PostgreSQL backup, but changes surrounding the project raise questions about sustainability and support. A critical part of the stack relies on a small group of maintainers. pgBackRest has long been the de facto standard for PostgreSQL backup and recovery. It is widely used in production and integrated into data … Read more
Edge error handling without diagnostics breaks observability. An analysis of why errors without context block analysis and how this is addressed.
How to perform JUnit 5 migration in a monorepo: automated code transformation, OpenRewrite, and phased change architecture
Single-threaded architecture in exchanges: how determinism and Raft ensure fault tolerance, log replay, and stable latency in high-load systems
Distributed systems trade-offs in real-world architecture: how the cloud changes scaling, and why replication matters more than sharding
A selection of architectural insights and releases we read this week Infrastructure 🔹 DataCenterGym: A physics-informed simulator for multi-objective data center scheduling. The tool allows modeling and optimizing resource allocation in data centers, taking into account physical constraints and multiple objectives, significantly improving management efficiency. Read the release 🔹 Spot-and-Scoot: Investigating spot instance availability. A methodology … Read more
Multi-region architecture through the lens of a sovereign fault domain: how to design high availability for a full region failure →
Kubernetes user namespaces in GA: how rootless containers and ID-mapped mounts reduce risks and accelerate startup without chown
Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.