Kubernetes fsGroup as a Hidden Bottleneck: Accelerating Restarts through fsGroupChangePolicy
A long restart of a stateful service rarely appears to be a security configuration issue. However, this is how the safe default in Kubernetes turned into 30 minutes of downtime for each restart. The problem manifested at scale. Atlantis, which manages Terraform through GitLab MR, operates as a singleton StatefulSet and stores state in a … Read more