Grafana gcx brings observability to the CLI and provides agents with access to production context. This reduces MTTR and eliminates the gap between code and the actual behavior of the system.
The problem does not manifest immediately — until the team accelerates development using agent tools. Code generation speeds up, but observability remains outside this loop. An engineer or agent writes code without seeing what is happening in production: there is no data on latency, no understanding of SLO, no signal of degradation. A gap arises: decisions are made based on assumptions rather than facts. Each transition to a separate observability tool increases context switching and prolongs incident response time.
In response, a pragmatic approach has been chosen — to move observability to the CLI through Grafana gcx. This provides a unified interface where the agent can not only write code but also read the state of the system. The key trade-off: a shift from GUI-oriented interaction to a text-based interface. But for agent environments, this is natural: models operate on a text in / text out basis, with predictable exit codes. As a result, gcx becomes the same class of tool as git or kubectl, but for observability and SRE tasks.
The implementation is built around the full cycle of observability: from the absence of metrics and alerts to their automatic creation and analysis. gcx does not require prior configuration as a prerequisite. This is important because most services start without instrumentation and SLO. The agent receives primitives for working with metrics, logs, and alerts directly in the terminal. The tool is adapted for agent mode: it removes visual noise, provides a machine-readable command catalog, and requires explicit confirmation for destructive operations. Context support, similar to kubectl, allows working with multiple environments without mutating global state.
A separate layer — agent skills. This is a set of instructions for typical tasks: setting up observability, investigating alerts, working with SLO, synthetic checks. They reduce uncertainty and decrease reliance on “guessing” CLI capabilities. Importantly, the agent now relies not on static knowledge from training but on the current state of the system. This changes the very form of interaction: questions become tied to real metrics and events.
The result — reduced incident response time and decreased operational noise. Tasks that previously turned into multi-day tickets can be completed in a single agent session. The cost of task execution also decreases due to more efficient interaction with the CLI and reduced token burn. Specific metrics are not provided, but the effect is stated in terms of accelerated diagnosis and problem resolution. The key change — the agent writing code receives the same level of visibility as the on-call engineer.
This approach aligns with the overall trend: observability as code and a shift of tools towards CLI-first. In agent development, this is not just convenience, but a requirement for the architecture of tools.