Replicating SQLite using Raft Consensus

September 2, 2014 Philip O'Toole 27 Comments

SQLite is a “self-contained, serverless, zero-configuration, transactional SQL database engine”. However, it doesn’t come with replication built in, so if you want to store mission-critical data in it, you better back it up. The usual approach is to continually copy the SQLite file on every change.

I wanted SQLite, I wanted it distributed, and I really wanted a more elegant solution for replication. So rqlite was born.

Why replicate SQLite?

SQLite is very convenient to work with — the entire database is contained within a single file on disk, making working with it very straightforward. Many people have experience with it, and it’s been a natural choice for adding relational-database functionality to many systems. It’s also rock-solid. However, since it isn’t replicated it can become a single point of failure in a system design. While it is possible to continually copy the SQLite file to a backup server every time it is changed, this file-copy must not take place while the database is being accessed.

I decided to build a distributed replication layer using the Raft consensus protocol, which gives me effective replication without the hassle of running a much heavier solution like MySQL. It provides all the advantages of replication, with the data modelling functionality of a relational database, but with the convenience of a single-file database. The entire system is written in Go, and the source is available on github.

An rqlite cluster

The diagram below shows an example rqlite cluster of 3 nodes, which continually work together to ensure that the SQLite file under each node is identical. With 3 nodes running, 1 node can fail, the cluster will remain up, and the data is still safe. In this example a leader has been elected and is coloured red. The Raft protocol dictates that all reads and writes should go through this node.

A 3-node rqlite cluster, replicating SQLite using Raft Consensus

For a write operation, only when a majority of nodes (including the leader) acknowledge that write, is that change actually committed to the Raft log and then to the actual SQLite databases underneath each node — it is the leader’s job to ensure this consensus is reached. If the leader fails, or there is a network partition such that the leader is cut off from the other two nodes, one of the other nodes will be elected leader shortly afterwards.

rqlite is a CP system. When faced with a network partition it chooses consistency over availability — reads and writes in the partition with a quorum of servers will remain available. But the servers on the other side of the partition will refuse to accept any changes. When the partition is healed however, these nodes will receive any changes made to the nodes on the other side of partition, and all copies of SQLite database will be in consensus again.

Choosing a Distributed Consensus algorithm

Raft is used as the consensus protocol for multiple projects including InfluxDB and etcd. They both use the goraft implementation, and since I want to write more Go, it was a natural choice to use for rqlite.

Deploying rqlite

You can find the source code for rqlite, and instructions on how to build and deploy it, here on github. I hope to continue developing this software, as distributed consensus systems are immensely interesting.

27 thoughts on “Replicating SQLite using Raft Consensus”

jmzc says:

November 16, 2014 at 7:29 pm

Hi:
So, only operations by HTTP API are replicated , right ?
What happens if all nodes reply with an ACK ( raft consensus ) and , before committing to raft log , one node fails ? When is synchronized again ?
Regards

Reply
Philip O'Toole says:

November 16, 2014 at 8:51 pm

Yes, only operations committed through the HTTP API are replicated.
As for the failure you outlined, when the node is brought back online it will contact the leader and find that the leader has entries in its log that it is missing. It will then write those entries to its log.
Check the Raft paper for full details.

Reply
Jens Rantil says:

February 23, 2015 at 12:59 pm

Just curious, what was the use-case for building this? Or was it mostly for fun?
Cheers,
Jens

Reply
1. Philip O'Toole says:
  
  February 23, 2015 at 4:35 pm
  
  Hi Jens — it started as a real need. A system I was working on professionally needed a small amount of (ideally relational) database storage. The system was clustered, so the question was which node should host this storage, which I didn’t like. So this was about showing we could run SQLite in a replicated manner, such that the loss of any single node would mean not mean the data would be lost. We ended up going a different route however, and rqlite was not used in production.
  
  Reply
  1. Stanislav says:
    
    April 2, 2015 at 6:51 pm
    
    And what was the route you has chosen, if it is not a secret?
    
    Reply
    1. Philip O'Toole says:
      
      April 2, 2015 at 7:53 pm
      
      Hi Stanislav — we also had access to a clustered Java-based key-value store (open source, but can’t provide the name), so just decided to use that. It did mean we had to build the relational layer ourselves though.
      
      Reply
Mike says:

August 4, 2015 at 10:50 am

Nice work! Very useful piece of software, but i have question. Philip is this possible to use https to secure database access? I want to synchronize many devices but i want to secure database against outside access? Is this possible or do you plan this functionality in near future? Regards.

Reply
1. Philip O'Toole says:
  
  August 10, 2015 at 1:10 pm
  
  Mike — thanks. There is no support for HTTPS access to the system, though you might be able to do it yourself by putting something like nginx in front of each node, and have it do the HTTPS to HTTP conversion.
  
  Reply
Jörg F. Wittenberger says:

November 14, 2015 at 12:03 pm

This is interesting!
Would it be feasible to modify this to tolerate byzantine failures? I.e, could I configure it to commit transactions only when 2/3+1 peers have voted for the same transaction?
Background: @ ball.askemos.org we have a programming platform sitting atop of replicated state machines. Now I’m looking for alternative implementations of replicated state machines, which may fit the bill. Our platform is a bit Erlang-inspired having persistent agents which communicate by message passing. We replicate those agents in consensus with byzantine failure resistance. Often we store the state of our agents in a sqlite3 database. Pretty much like you’re doing here. That’s how I found rsqlite in the first place.

Reply
1. Philip O'Toole says:
  
  November 16, 2015 at 12:03 am
  
  Hi Jörg — if this system was changed to support byzantine failures, the code base would be very different. Since it’s built on the Raft consensus protocol, it only supports the failure modes that Raft tolerates. So if the code was modified as you would like, it would be a very different system — it wouldn’t be a modification so much as a new system. I hope this helps.
  
  Reply
  1. Jörg F. Wittenberger says:
    
    November 20, 2015 at 5:36 pm
    
    I see. When I posted, I assumed that Raft was actually a *consensus* protocol. Which implies that it would normally tolerate byzantine faults. But Raft is in fact a *coherence* protocol, where you must trust the leader (making the latter a SPOF).
    Sure it helps: we better stick with the code we’ve been using so long as this protects us against byzantine faults. The slowdown is not as bad as a backdoor would be.
    
    Reply
    1. Philip O'Toole says:
      
      November 21, 2015 at 3:03 am
      
      I am not sure the guys at Stanford would agree with you. They clearly consider Raft a consensus protocol. I am unfamiliar with the term “coherence protocol”.
      Also the leader is clearly not a single point of failure when you consider the system as a whole (which is the point). The system remains fully functional if the leader fails — another node simply becomes leader.
      
      Reply
alex says:

February 8, 2016 at 2:02 pm

can this setup prevent db locks?

Reply
1. Philip O'Toole says:
  
  February 8, 2016 at 10:40 pm
  
  Alex — I don’t really understand the question. I know what you might mean by a database lock, but I don’t see exactly how that is specific to rqlite.
  
  Reply
alex says:

February 10, 2016 at 10:43 am

sorry about that . i was wondering if by having more nodes it will prevent less dblocks?
going to try it & let you know.

Reply
1. alex says:
  
  February 10, 2016 at 3:30 pm
  
  it works like a charm.
  i’m really impressed Philip.
  one question
  if i want to change location of the db
  can i do localhost:4001/dev/shm/db ?
  
  Reply
Philip O'Toole says:

February 10, 2016 at 8:23 pm

The path to the database file on each node is passed to the rqlite on startup.
https://github.com/otoolep/rqlite/blob/master/README.md
You can’t change once set, without moving the database file to a new location.

Reply
alex says:

February 21, 2016 at 8:45 am

how do you determine which one is the leader ?
if i start the node 1, i’m seeing the following:
is there a way to do a clean start?
2016/02/21 10:40:28 [ERROR] attempted leader redirection, but no leader available
2016/02/21 10:40:29 [ERROR] attempted leader redirection, but no leader available
2016/02/21 10:40:42 [ERROR] attempted leader redirection, but no leader available

Reply
1. Philip O'Toole says:
  
  February 24, 2016 at 11:43 pm
  
  Alex — rqlite has a new consensus module, and those errors will no longer occur. You should run the latest code instead.
  
  Reply
  1. alex says:
    
    March 2, 2016 at 7:52 am
    
    i will grab the new version.
    thank you
    
    Reply
Joel David Payares Hernández says:

March 9, 2016 at 6:04 pm

Buenas noches. Amigos me gustaría saber en que sistema operativo fue montado el cluster o como hizo?

Reply
Ankit says:

May 4, 2016 at 1:15 am

Does it support multiple SQLite DBs? We plan to use multiple SQLite DBs and want to replicate all of them. Also the writes and reads can be done on any of the DBs.

Reply
1. Philip O'Toole says:
  
  May 4, 2016 at 8:22 am
  
  No, you must run a rqlite cluster per SQLite database. And all writes must go through the leader, but reads can go through any node, depending on what read-consistency guarantees you can accept.
  
  Reply
tekbird says:

January 9, 2018 at 3:30 am

How do I use SqliteStudio to browse data in rqlite?

Reply
1. Philip O'Toole says:
  
  January 9, 2018 at 7:41 am
  
  You could use it to browse the SQLite file directly, but it’s not officially supported.
  https://github.com/rqlite/rqlite/blob/master/README.md#limitations
  
  Reply
Arnold Myers says:

August 27, 2020 at 9:47 am

Let’s assume there are 3 nodes and one of the nodes got disconnected from the network and that was the Master, in this case there will be a new leader and it is great, but how can I still be able to write data to the disconnected node and when the network connectivity comes back up, we want it to push the changes back to the new master and merge the data if the same tables were used in the DB.

Reply
1. Philip O'Toole says:
  
  September 24, 2020 at 11:59 pm
  
  No, rqlite doesn’t support that because’s a CP-system. You will only be able to write to the side with the quorum of nodes, and the disconnected node will respond with “no leader”. You could perform some queries though, if you accepted “weak” read-consistency.
  
  Reply

Vallified