Observability: the degree to which you can ask new questions of your system without having to ship new code or gather new data.
Above is my slightly modified definition of observability, mostly stolen from Charity Majors in Observability: A Manifesto.
Observability is increasingly important. Modern apps and services are more resilient and fail in soft, unpredictable ways. These failures are too far on the edges to appear in charts. For example, an app may perform dramatically worse for one specific user that happens to have a lot of associated database records. This would be hard to identify on a response time chart for apps doing reasonable throughput.
However, understanding observability is increasingly confusing. Sometimes observability appears an equation: observability = metrics + logging + tracing. If a vendor does those three things in a single product, they've made your system observable.
If observability is just metrics, logging, and tracing, that's like saying usability for a modern app is composed of a mobile app, a responsive web app, and an API. Authorize.net has those things. So does Stripe. One is clearly more usable than the other.
I think it's more valuable to think about how your existing monitoring tools can be adapted to ask more questions. There's significant room for this in standalone metrics, logging, and tracing tools.
At Scout, we've been thinking about how we can help folks ask more performance-related questions about their apps. We're not building a custom metrics ingestion system. We're not adding a structured logging service. We're focusing on our slice of the world.
Below I'll share our top-secret observability plan.