Packt recently asked me to review their new publication Cassandra High Availability, written by Robbie Strickland. I’ve worked with Cassandra in the past — early designs of Loggly‘s 2nd generation Log analytics platform used Cassandra as its authoritative store for log data, but we ended up pulling it and using elasticsearch as both the store and search engine.
After checking out the list of technical reviewers for the book, I was hopeful it would be a worthwhile read, as the reviewers include contributors to the core of Cassandra. I wasn’t disappointed and found the book to be very good.
At a high-level I found the content choice to be excellent, and the quality of writing to be very high. The book adopts a pragmatic approach, describing the most important features, design decisions, and deployment techniques involved with making your Cassandra deployment highly-available. What I particularly liked was that the author has chosen his content well, and for the content that he did choose to write about, he goes into the proper level of detail. I particularly enjoyed the introductory sections on why traditional relational databases don’t scale, as it’s always nice to read a clear, concise description of something you already understand, but sometimes find difficult to explain to others.
Since I have built systems that stored log data in Cassandra, and work for a company building a time-series database, I also enjoyed the chapter on data modelling. The chapter on anti-patterns was also very good, and having once tried to build a queue on Cassandra, I smiled when that particular anti-pattern was discussed.
I was pleased to see a section on monitoring. While not discussed in huge detail, monitoring a distributed database is a critical part of keeping the system highly-available. Getting the design and code right is obviously important, but that’s not the full story. Effective and correct deployments are the other requirement for reliable production systems. That said, running large Java-based distributed systems can be problematic at the best of times, and often involve many dependencies. It’s one of the reasons I’m so excited about building InfluxDB in Go as I believe it will produce a better and easier-to-deploy system.
The book is not long, and I read it completely in a few hours. As someone with experience with these kinds of systems, I really enjoyed reading this book but I do think that someone with no experience in distributed databases might be better starting off with a different text, and then returning to this book.
In the interests of full disclosure: Packt provided me with a free e-copy of the book for the purposes of the review.