Meaningful Uptime Measurements for the Cloud

Another interesting paper came my way, thanks to the Morning Paper mailing list. Nines are Not Enough:Meaningful Metrics for Clouds discusses a topic that I deal with regularly in my role at Google.

SLIs, SLOs, and SLA are easy to discuss in a general sense, but surprisingly subtle to put into practise. This paper, authored by Google engineers, explores why this is so, and offers a new framework for thinking about them.

How I handle my Gmail load at Google

As an Engineering Manager at Google, I get a lot of email — everyone does. Google — at least my group — doesn’t make heavy use of IM-like tools internally, and I’m happy about that. Combined with traffic from the internal system, it all adds up to a lot in my Inbox.

So I was forced to really think about how I handle it all — and not miss anything important.

Continue reading How I handle my Gmail load at Google

Go race detection and failing fast

Go remains one of the languages I’m most productive in. Its combination of the rigour of static typing, but fluidity of Python, makes it both robust and easy to code in.

It’s also got some innovative features that help you catch those tough-to-find issues, particularly when they only occur in production. An example is the Go Race Detector.

Continue reading Go race detection and failing fast

Philip O'Toole