DocDB architecture for zero-downtime scaling
DocDB architecture: how Stripe scales databases to 5 million QPS through zero-downtime data movement and strict data control.
Databases on ThecoreGrid explores the design, operation, and scaling of data storage systems in highload and distributed environments.
We cover relational and NoSQL databases, NewSQL systems, and specialized storage engines, focusing on consistency models, replication, partitioning, and fault tolerance. Topics include query optimization, indexing strategies, transaction management, and performance tuning under real production workloads. We analyze trade-offs between strong and eventual consistency, latency and durability, and operational complexity at scale. Content includes real-world BigTech practices, incident post-mortems, and lessons from failures in distributed data systems. You’ll also find deep dives into storage internals, caching layers, data lifecycle management, and multi-region deployments. Instead of basic setup guides, the Databases tag delivers practical engineering insights for backend engineers, data engineers, architects, and SRE teams responsible for reliable, scalable, and efficient data persistence.
DocDB architecture: how Stripe scales databases to 5 million QPS through zero-downtime data movement and strict data control.
Redis proxy becomes a key layer for cache management as load and complexity increase. Let’s explore how an architectural proxy eliminates degradation and stabilizes highload systems. The problem does not manifest immediately — until the moment Redis stops being a “transparent” component and starts dictating system behavior. In the described case, degradation began with an … Read more
pgBackRest remains a key tool for PostgreSQL backup, but changes surrounding the project raise questions about sustainability and support. A critical part of the stack relies on a small group of maintainers. pgBackRest has long been the de facto standard for PostgreSQL backup and recovery. It is widely used in production and integrated into data … Read more
Seastar output stream now supports mixed writes. An analysis of invariant-based testing and AI debugging in complex state transitions
Cross-site replication PXC in Kubernetes: how to set up DR via Percona Operator and avoid degradation due to latency and flow control
LLM Infrastructure, GPU Inference, Agentic Systems, Distributed Systems, High Performance Computing, HPC, Cloud Native, Data Infrastructure
Distributed sequence generation replaces database sequences at scale. It removes central bottlenecks while keeping compatibility with existing systems. The problem does not manifest immediately — until the organization attempts to transition from a relational database to a cloud-native storage solution. In this case, over a hundred services relied on database sequences for generating primary keys. … Read more
SKID identifiers: how to combine sortability, security, and zero-lookup verification in distributed systems without dual keys. –>
MD5 has long been the standard for authentication in PostgreSQL. However, accumulated limitations have led to a gradual phasing out and a transition to a more robust model.
Request timeouts do not always indicate a problem in the database. Often, degradation is hidden in the path between the application and the DB. The problem manifests when database metrics appear stable, but clients experience timeouts. At the observation level, this looks like a contradiction: latency increases while database time remains the same. The reason … Read more
Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.