Tag Archives: distributed systems

Call me Definitely

The creator of the network monitoring system Riemann, Kyle Kingsbury, has put together a comprehensive series of blog posts, on the fault-tolerance, high-availability, and general correctness of number of database and storage technologies. Of the technologies discussed I am most familiar with — elasticsearch and Apache Kafka — I found the posts to be a great read.

If you haven’t read them yet, you should check them out on his site.

InfluxDB and Grafana HOWTO

This blog describes working with InfluxDB 0.8. InfluxDB 0.8 is no longer supported, and has been superseded by the 1.0 release.

grafanaI recently came across InfluxDB — it’s a time-series database built on LevelDB. It’s designed to support horizontal as well as vertical scaling and, best of all, it’s not written in Java — it’s written in Go. I was intrigued to say the least.

Continue reading InfluxDB and Grafana HOWTO

Infrastructure at Scale: Apache Kafka, Twitter Storm and elasticsearch

storm_logoAWS have posted the video online of Jim Nisbet’s and my talk at AWS:reinvent 2013. In it, Jim and I describe the system we built at Loggly, which uses Apache Kafka, Twitter Storm, and elasticseach, to build a high-performance log aggregation and analytics SaaS solution, running on AWS EC2.

Continue reading Infrastructure at Scale: Apache Kafka, Twitter Storm and elasticsearch

Speaking at AWS re:Invent 2013

amazon.com_web_servicesThis past week I had the opportunity to speak, with my colleague Jim Nisbet, at AWS re:Invent 2013. Titled “Unmeltable Infrastructure at Scale: Using Apache Kafka, Twitter Storm, and Elastic Search on AWS“, Jim and I described the architecture of Loggly’s next-generation log aggregation and analytics Infrastructure, which went live 3 months ago, and runs on AWS EC2.

Continue reading Speaking at AWS re:Invent 2013

Avoiding elasticsearch split-brain

elasticsearchLoggly recently held an elasticsearch meetup, which was a great success. One question that was repeatedly asked was how to ensure elasticsearch does not suffer a partition — known as a split-brain. This can be a particular problem in AWS EC2, where the network is subject to interruptions. It can also happen if the elasticsearch master node performs long garbage collection cycles.

One configuration that is very effective at preventing this problem is described in this post.

Continue reading Avoiding elasticsearch split-brain