rqlite is a replicated relational database built on SQLite, with distributed consensus provided by the Raft consensus protocol. It gracefully handles leader election, and can tolerate machine failure.
I made a presentation on rqlite tonight at the San Francisco Go Meetup. It was an enjoyable evening, and I had a chance to discuss why I built rqlite, how it works, and where it might go in the future.
rqlite provides robust replication for SQLite databases using the Raft consensus protocol. Coded in Go it ensures that all changes made to the leader SQLite database are replicated to all other nodes in the cluster, providing fault-tolerance and reliability.
It’s been 18 months since development of rqlite first started and it’s time for version 2.
I’ve started replacing go-raft within rqlite with the implementation from Hashicorp. go-raft is no longer maintained, and I’ve good experience with the Hashicorp code, due to my work with InfluxDB and hraftd. I’m also going to change the API, so it’s more useful. The existing implementation and API has been tagged as v1.0, so it’s still available.
You can follow the work on this branch, and I hope to merge it to master in the near future.
I recently presented at the InfluxDB San Francisco Meetup, on InfluxDB and the Raft consensus protocol. My talk was about the fundamental problems of distributed systems, and how InfluxDB uses Raft to solve these issues.
Hashicorp provide a nice implementation of the Raft consensus protocol, and it’s at the heart of InfluxDB (amongst other systems). I wanted to experiment with a simple system built using this particular Raft implementation, so was inspired by raftd to built hraftd.
Packt recently asked me to review their new publication Cassandra High Availability, written by Robbie Strickland. I’ve worked with Cassandra in the past — early designs of Loggly‘s 2nd generation Log analytics platform used Cassandra as its authoritative store for log data, but we ended up pulling it and using elasticsearch as both the store and search engine.
SQLite is a “self-contained, serverless, zero-configuration, transactional SQL database engine”. However, it doesn’t come with replication built in, so if you want to store mission-critical data in it, you better back it up. The usual approach is to continually copy the SQLite file on every change.
I wanted SQLite, I wanted it distributed, and I really wanted a more elegant solution for replication. So rqlite was born.
Continue reading Replicating SQLite using Raft Consensus
The creator of the network monitoring system Riemann, Kyle Kingsbury, has put together a comprehensive series of blog posts, on the fault-tolerance, high-availability, and general correctness of number of database and storage technologies. Of the technologies discussed I am most familiar with — elasticsearch and Apache Kafka — I found the posts to be a great read.
If you haven’t read them yet, you should check them out on his site.
Over 16 years, I’ve written software up-and-down the entire stack. Earliest in my career I wrote boot ROM software for specialized embedded devices. This kind of programming taught me so much about how computers really work.
This blog describes working with InfluxDB 0.8. InfluxDB 0.8 is no longer supported, and has been superseded by the 1.0 release.
I recently came across InfluxDB — it’s a time-series database built on LevelDB. It’s designed to support horizontal as well as vertical scaling and, best of all, it’s not written in Java — it’s written in Go. I was intrigued to say the least.
I came across a very readable paper on distributed systems — Distributed systems for fun and profit. I recommend it for anyone interested in learning more about distributed systems, and the challenges involved with designing, building, and operating distributed systems.
AWS have posted the video online of Jim Nisbet’s and my talk at AWS:reinvent 2013. In it, Jim and I describe the system we built at Loggly, which uses Apache Kafka, Twitter Storm, and elasticseach, to build a high-performance log aggregation and analytics SaaS solution, running on AWS EC2.
This past week I had the opportunity to speak, with my colleague Jim Nisbet, at AWS re:Invent 2013. Titled “Unmeltable Infrastructure at Scale: Using Apache Kafka, Twitter Storm, and Elastic Search on AWS“, Jim and I described the architecture of Loggly’s next-generation log aggregation and analytics Infrastructure, which went live 3 months ago, and runs on AWS EC2.
Loggly recently held an elasticsearch meetup, which was a great success. One question that was repeatedly asked was how to ensure elasticsearch does not suffer a partition — known as a split-brain. This can be a particular problem in AWS EC2, where the network is subject to interruptions. It can also happen if the elasticsearch master node performs long garbage collection cycles.
One configuration that is very effective at preventing this problem is described in this post.