I’ve been developing rqlite since 2014 and its design and implementation has evolved substantially during that time — and the design docs tell the story of what worked, and what didn’t. So what can we learn about distributed database design, from watching rqlite change over the years?
As time has passed the distributed consensus system has changed, the API has improved enormously, security was improved, and support for automatic clustering and node-discovery was introduced along the way. But none of this happened all at once, and much of the design changes were in response to previous wrong turns.
What worked? And what didn’t?
Let’s take a look through time, letting the design choices tell the story.
Introduction to replicating SQLite with Raft – this was the post that started it all. Why not combine Raft with SQLite? Would it work? It turned out it would work very well, got the attention of Hacker News, but the initial version was far from production-ready. Years of development lay ahead.
Moving to an upgraded Raft consensus system – with 2 years of practical experience it became pretty clear that rqlite needed to run Hashicorp’s more modern Raft code, and needed a much better HTTP API. The API put in place by version 2.0 remains largely the same to this day — 7 years later — showing that the API choices were good ones.
Leader-redirection added – an important change which made it much easier for clients to communicate with the cluster. With redirection clients could contact any node in the cluster, and if that node wasn’t the Leader, the node would let the client know where on the network the Leader was. Leader redirection would eventually be superseded years later by transparent forwarding to the Leader.
Building an rqlite discovery service using AWS Lamda – this was the first attempt at building auto-clustering and node-discovery for rqlite. It worked, but the implementation wasn’t elegant, nor was the Discovery mechanism reliable. At the time development with AWS Lambda was pretty crude, which didn’t help. This Discovery system would be turned down within 5 years, but the code remains available on GitHub.
Node-to-node encryption added – this was one of the early features aimed at making it easier to run rqlite in production environments, where networks had to be shared with other systems. Getting the configuration of node-to-node encryption right was tricky — in fact, it wasn’t fully correct until 2023.
Scaling read performance – read-only nodes were added, allowing users to horizontally scale read performance. This worked well, because it was mostly built on preexisting functionality in the Hashicorp Raft library.
Moving from JSON to Protocol Buffers for internal data structures – 6 years into development JSON-based data modeling wasn’t robust enough, and I was writing a lot of custom marshaling code since rqlite was interacting with more and more systems outside of itself. It was time for something more sophisticated. Moving to Protobufs was a big win and Protobufs have become the core data modeling mechanism throughout most of the code.
Comparing disk usage across database releases – rqlite is very sensitive to disk performance, so monitoring disk usage became more and more important to understanding performance. What testing showed was that performance did improve across versions, but not in the way I had hoped. It turned out the the number of fsync calls didn’t change, and that was the real performance bottleneck. But disk usage has dropped substantially over the years, particularly with the move to compression when writing to disk.
7 years of open-source database development – lessons learned – By now I had been developing open-source databases — both InfluxDB and rqlite — for some years. Patterns in my database development experience were emerging.
The evolution of a distributed database design – After 7 years of development my understanding of how rqlite should cluster was becoming much more sophisticated — if only it hadn’t taken so long! But the design changes introduced by rqlite 6.0 resulted in a much more robust clusters, and the approach to inter-node communications introduced by 6.0 remains the same to this day.
Designing node discovery and automatic clustering – This was one of the most important design changes in the history of rqlite. rqlite 7.0 finally fixed auto-clustering, and introduced proper node discovery. As a result rqlite worked really well on Kubernetes, with official deployment configurations now available
Evaluating rqlite consistency with Jepsen-style testing – by this point people in academia were testing rqlite — and it was performing well. This was a testament to the excellent Raft implementation that ran at the center of rqlite.
Trading durability for write performance – end-users were now approaching me with a wider and wider set of potential use cases. It was finally time to loosen the durability guarantees rqlite offered, but offering very large increases in write performance in return.
How rqlite exposed a bug in SQLite – the icing on the cake for 2022: rqlite load testing brings out a bug in SQLite, helping improve one of the most important pieces of software in the world in the process.
Mutual TLS support added, offering even more security – in response to some concerns about the inter-node communications being too exposed, mutual TLS support was added in release 7.14.0 — which was also the first release to be created with the help of GitHub Copilot! Mutual TLS allows users to ensure that only authorized nodes can ever communicate with the cluster. This could already be achieved using network-level security, but adding Mutual TLS allows users to bring a whole new level of security to their deployments.
What does the future hold?
Many of the goals I have had for rqlite have been met — with one notable exception. rqlite still doesn’t offer comprehensive support for distributed transactions (though it does support a form of transactions). I have a pretty good idea of how to add it, but it will require major changes to the core of rqlite.
But one of rqlite’s goal is simplicity of operation, and once transactions are introduced, it may no longer be as simple to use. Only time will tell how the next phase of rqlite design will proceed.