B2B Engineering Insights & Architectural Teardowns

Platform health through the lens of developer experience

Platform health cannot be measured solely by observability. The key is developer experience and actual platform adoption.

The problem does not manifest immediately. The platform may appear stable: metrics are green, uptime is high, alerts are under control. But at some point, teams start to bypass it. Custom scripts, manual processes, and “shadow” workflows emerge. This is a classic signal of a ghost town platform: the system is technically healthy but not utilized. Observability answers the question “what broke,” but does not address “why it is not being used.” This is where the divergence between availability and real value begins.

The solution hinges on changing the evaluation model. Platform health is not availability, but utility. The key signal is adoption, not usage. Usage can be enforced through constraints. Adoption must be earned. If developers choose the “golden path,” it means it is faster and safer than alternatives. Metrics shift: onboarding time, self-service rate, organic team connections. It is also important to consider the trade-off: high adoption does not guarantee satisfaction. If a tool is used “because they were forced,” that is a hidden degradation.

Implementation requires adding a human layer to the measurement system. Technical metrics are supplemented by feedback. The simplest tool is a regular developer NPS with an open text field. This is where bottlenecks are identified: repetitive manual steps, non-obvious errors, inefficient scenarios. A separate emphasis is placed on feedback loops. DORA research shows that a key factor in developer experience is clear feedback on the results of actions. It is not the complexity of tooling, but the predictability of the system. If the platform leaves an engineer in uncertainty during a failure, that is a problem of interaction architecture, not just reliability.

Reliability does not disappear, but changes form. Uptime becomes a baseline level, not a goal. The behavior of the system during failures is more important. Mean Time to Recovery (MTTR) and change failure rate begin to reflect real experience. SLOs also shift: not “99.9% availability,” but “99% of deployments succeed on the first attempt.” This translates reliability into the realm of developer experience. Additionally, another strong signal emerges — toil. Manual work that scales with growth directly reduces throughput. If the platform is healthy, it systematically absorbs toil.

The results of such changes are measured differently. If there is no data, that is already a signal of platform immaturity. But with metrics in place, the picture becomes manageable. Reduced onboarding time, increased self-service, decreased toil share — these are indicators that can be linked to business impact. DORA metrics, such as lead time for changes and deployment frequency, complement the picture of speed. Together, this forms a language for dialogue with the business, not just an engineering status.

The final shift is organizational. The platform becomes a product. This means a roadmap based on developer feedback, regular health reports, and clear accountability for improving developer experience. Without this, the platform remains an infrastructure project without demonstrable value. In such conditions, initiatives often lose funding because observability alone does not explain “so what.”

Read

×

🚀 Deploy the Blocks

Controls: ← → to move, ↑ to rotate, ↓ to drop.
Mobile: use buttons below.