Another interesting paper came my way, thanks to the Morning Paper mailing list. Nines are Not Enough:Meaningful Metrics for Clouds discusses a topic that I deal with regularly in my role at Google.
SLIs, SLOs, and SLA are easy to discuss in a general sense, but surprisingly subtle to put into practise. This paper, authored by Google engineers, explores why this is so, and offers a new framework for thinking about them.
Since I recently joined Google Cloud Platform (GCP), I thought it’s time to get some practical experience with the platform. As a result I’m going to migrate this blog from Rackspace to GCP — specifically I’ll use GCE for WordPress, and Cloud SQL for the persistent database storage.
Continue reading Deploying Vallified on GCP
Real-time — or near real-time — data pipelines are all the rage these days. I’ve built one myself, and they are becoming key components of many SaaS platforms. SaaS Analytics, Operations, and Business Intelligence systems often involve moving large amounts of data, received over the public Internet, into complex backend systems. And managing the incoming flow of data to these pipelines is key.
Continue reading Drop, Throttle, or Buffer
I’ve been thinking a lot recently about what makes computer services and products sticky — what makes users and customers come back again and again to what you’ve built. There are lots of ways to summarize it, but when it comes to systems that help technical people run their own systems, they come for the features, but they stay for the uptime.
Continue reading Come for the Features, Stay for the Uptime
Over 16 years, I’ve written software up-and-down the entire stack. Earliest in my career I wrote boot ROM software for specialized embedded devices. This kind of programming taught me so much about how computers really work.
Continue reading A Prayer for Distributed Systems Developers
This blog describes working with InfluxDB 0.8. InfluxDB 0.8 is no longer supported, and has been superseded by the 1.0 release.
I recently came across InfluxDB — it’s a time-series database built on LevelDB. It’s designed to support horizontal as well as vertical scaling and, best of all, it’s not written in Java — it’s written in Go. I was intrigued to say the least.
Continue reading InfluxDB and Grafana HOWTO
AWS have posted the video online of Jim Nisbet’s and my talk at AWS:reinvent 2013. In it, Jim and I describe the system we built at Loggly, which uses Apache Kafka, Twitter Storm, and elasticseach, to build a high-performance log aggregation and analytics SaaS solution, running on AWS EC2.
Continue reading Infrastructure at Scale: Apache Kafka, Twitter Storm and elasticsearch
Loggly recently held an elasticsearch meetup, which was a great success. One question that was repeatedly asked was how to ensure elasticsearch does not suffer a partition — known as a split-brain. This can be a particular problem in AWS EC2, where the network is subject to interruptions. It can also happen if the elasticsearch master node performs long garbage collection cycles.
One configuration that is very effective at preventing this problem is described in this post.
Continue reading Avoiding elasticsearch split-brain
Another post for the Riverbed Technology Blog, on the value of looking back.
You can read it here.
I recently wrote a entry for the Riverbed Technology blog, describing an interesting collaborative development experience I had with the AWS EC2 Cloud.
You can read it here.