The tagged storage pattern addresses the issue of stale configurations and metadata service overload in multi-tenant systems. We will explore how this works on AWS and where the boundaries of trade-offs lie.
The problem does not manifest immediately — until the number of tenants exceeds hundreds, and configurations begin to change faster than the cache lives. The classic approach with TTL runs into a contradiction: either we accept stale data and risk violating isolation or feature flags, or we aggressively invalidate the cache and overload the metadata service. At this point, the configuration service itself becomes a bottleneck in throughput. The situation is complicated by data heterogeneity: some configurations require high read frequency (suitable for DynamoDB), while others require hierarchies and versioning (Parameter Store). There is no universal storage here, and attempting to use one leads to excessive costs or latency degradation.
The solution is built around the tagged storage pattern. The idea is simple: configuration keys receive prefixes (for example, tenant_config_ or param_config_) that determine which storage the request will go to. This removes the need to choose a “single best” storage and allows routing data based on their access pattern. Within the service, the Strategy Pattern is used, which selects the backend based on the key. This is a compromise, but a pragmatic solution: adding a new storage does not require rewriting logic but introduces an additional layer of abstraction and complicates debugging.
The architecture relies on several layers. At the entry level, requests pass through Cognito, WAF, and API Gateway, then through VPC Link to reach ALB and further into services on ECS Fargate. Inside the service layer, NestJS with gRPC is used. This reduces network overhead and improves latency for service-to-service interactions. A key point is tenant isolation: the service does not accept tenantId from the request; it is extracted from the JWT. This eliminates a class of attacks where a client attempts to substitute the tenant context.
The storage layer is implemented as a multi-backend strategy. DynamoDB uses composite keys for isolation and efficient queries. Parameter Store is organized hierarchically, which simplifies version management. In more complex scenarios, an additional dimension is introduced in the key (for example, service-level), which allows restricting access not only at the tenant level but also at the service level. This is important for systems with different areas of responsibility within a single tenant.
A separate issue is the synchronization of configurations. Polling creates unnecessary load and delays. Service restarts result in downtime. Here, an event-driven approach is applied: EventBridge monitors changes in Parameter Store and triggers a Lambda that updates the local cache. This eliminates the staleness window and removes the need for polling. Configurations are updated within seconds without interrupting user sessions.
Caching is implemented at several levels. In the service’s memory, metadata is stored with keys in the format tenantId:serviceName:configKey. Sensitive data is not cached — it remains in Parameter Store with encryption (SecureString). This is an important balance between performance and security. If necessary, the cache can be moved to Redis or Valkey, but this will add network latency in the range of several milliseconds.
From a security perspective, isolation is ensured at multiple levels: JWT, DynamoDB keys, and service logic. A stricter option is also possible — through STS and Token Vending Machine for issuing temporary IAM credentials at the tenant level. This enhances control (including audit via CloudTrail) but adds latency and operational complexity.
The result of this approach is a system that scales without a clear bottleneck in the metadata service and does not suffer from stale configurations. While exact metrics of improvement are not provided, the architecture resolves the fundamental conflict between consistency and performance. The trade-off is the complexity of the architecture and the need to maintain multiple storage backends simultaneously.
The tagged storage pattern is not universal. It is justified when:
– there are different types of configurations with varying access patterns
– the number of tenants is growing
– strict isolation and fast updates are important
In other cases, it may be excessive. But as an evolutionary improvement for mature multi-tenant systems, it is a logical step.