Monitoring — the measurement of your system, the gathering of telemetry, and alerting when it behaves anomalously — is key to running large-scale, modern computer systems. But what many developers today don’t realise is that monitoring can be a key part of your design cycle too.
Because monitoring can be an integral part of bringing up a piece of software, and what you learn from that monitoring can be fed back into your design.
Perhaps you’re trying to determine how much effort to put into various error-handling code and systems. A key factor in that effort could be which error, or errors, occur most frequently. Instead of trying to only reason about the errors your system will be exposed to, you could instead write simple software that simply measures the various error rates. In other words, use monitoring to learn about the environment in which your software and system will be running. No more guessing.
It can be even simpler. If there is some question about whether a particular error even occurs, don’t just rely on reason — study and measure your environment, and feed that back into your design.
Perhaps you need to understand the bandwidth, latency, and performance of the environment. Again — start by simply monitoring and measuring, and feed those findings back into your design and implementation.
Finally by adding monitoring to your software and system from day one, it guarantees that monitoring will be part of your system when it launches in production — and that what you monitor is actually relevant.
Monitor early and often. And remember, it’s not just for production.