When the number of containerized services grows faster than the platform team, the bottleneck is not Kubernetes itself, but its operation. Generali faced exactly this challenge—and shifted the focus from cluster management to application management.

The main limitation was not performance, but operations. The microservices portfolio was expanding, multi-tenant scenarios emerged, and with them—manual scaling, fragmented security practices, and excessive resource overprovisioning. Different teams meant different standards, leading to an unstable security posture and complicating compliance. Kubernetes as a platform worked, but its maintenance started to consume engineering time.

Choosing Amazon EKS was expected: mature AWS integration and existing team experience. But the key decision was moving to EKS Auto Mode, where AWS takes over node management, updates, add-ons, and part of the operational model. This is a trade-off: sacrificing some infrastructure control for consistency and reduced toil. In return—automatic scaling, unified environments, and built-in security practices.

In practice, this required process adaptation. Auto Mode regularly updates nodes (including Bottlerocket), which means they are recreated. To avoid degradation, the team introduced maintenance windows and strictly configured Pod Disruption Budgets and Node Disruption Budgets. Architectural constraints also became part of the strategy: only stateless services, immutable pods, deployment via Helm, scaling via HPA. Secrets are managed in AWS Secrets Manager with synchronization through External Secrets Operator—no embedding in manifests. The network model is reinforced with AWS Network Firewall and egress filtering by SNI, closing a typical gap with outbound connections. Security is further enhanced with GuardDuty (including runtime and audit signals) and Inspector, which maps vulnerabilities to actually running containers, not just images in the registry.

A separate layer is observability and finances. Using CloudWatch and Managed Grafana, the team gained visibility by namespace and project without managing Grafana itself. For cost allocation, EKS-level tags (cluster, namespace, deployment) are used, allowing costs to be mapped to business units—critical in a multi-tenant environment.

The result is less about raw performance and more about effort redistribution. Operational tasks (patching, upgrades, scaling) moved to the platform, and the team focused on supporting product teams. Security consistency improved, resource overprovisioning decreased, and incident investigation was simplified through signal correlation. Metrics are not disclosed directly, but by description—this is a classic case of reducing toil and increasing platform predictability through managed Kubernetes.

Read the original

🚀 Deploy the Blocks