I recently acted as one of the official technical reviewers for ElasticSearch Cookbook – Second Edition by Alberto Paro. Published by Packt Publishing, the book contains a large number of “recipes” for elasticsearch.
I recently came across a talk on YouTube titled History of Software Engineering, given by Paolo Perrotta. Normally I find online videos to have a low information-to-time ratio, but this one was excellent. It’s not too long, with plenty of humour, and makes many serious points that resonated with me.
Packt recently asked me to review their new publication Cassandra High Availability, written by Robbie Strickland. I’ve worked with Cassandra in the past — early designs of Loggly‘s 2nd generation Log analytics platform used Cassandra as its authoritative store for log data, but we ended up pulling it and using elasticsearch as both the store and search engine.
Bjarne Stroustrup has another very interesting paper on his website. Titled Software Development for Infrastructure, it discusses some key ideas for building software that has “…more stringent correctness, reliability, efficiency, and maintainability requirements than non-essential applications.” It is not a long paper, but offers useful observations and guidelines for building such software systems.
The Eudyptula Challenge is a series of programming tasks, with the goal of getting one up-to-speed on Linux kernel programming. When I first heard about it, it immediately intrigued me. I’ve written a few production Linux kernel modules in my time — mostly device drivers — so I started the challenge today.
Real-time — or near real-time — data pipelines are all the rage these days. I’ve built one myself, and they are becoming key components of many SaaS platforms. SaaS Analytics, Operations, and Business Intelligence systems often involve moving large amounts of data, received over the public Internet, into complex backend systems. And managing the incoming flow of data to these pipelines is key.
I’ve been thinking a lot recently about what makes computer services and products sticky — what makes users and customers come back again and again to what you’ve built. There are lots of ways to summarize it, but when it comes to systems that help technical people run their own systems, they come for the features, but they stay for the uptime.
Bjarne Stroustrup has great paper on his website titled Evolving a language in and for the real world: C++ 1991-2006. It provides fascinating insights on the development of the language, the challenges involved, and discusses interesting design ideas. If you have even a basic understanding of C++, it’s a such a worthwhile read.
SQLite is a “self-contained, serverless, zero-configuration, transactional SQL database engine”. However, it doesn’t come with replication built in, so if you want to store mission-critical data in it, you better back it up. The usual approach is to continually copy the SQLite file on every change.
I wanted SQLite, I wanted it distributed, and I really wanted a more elegant solution for replication. So rqlite was born.
The creator of the network monitoring system Riemann, Kyle Kingsbury, has put together a comprehensive series of blog posts, on the fault-tolerance, high-availability, and general correctness of number of database and storage technologies. Of the technologies discussed I am most familiar with — elasticsearch and Apache Kafka — I found the posts to be a great read.
If you haven’t read them yet, you should check them out on his site.
Java is the predominant language of Big Data technologies. HBase, Lucene, elasticsearch, Cassandra – all are written in Java and, of course, run inside a Java Virtual Machine (JVM). There are some other important Big Data technologies, while not written in Java, also run inside a JVM. Examples include Apache Storm, which is written in Clojure, and Apache Kafka, which is written in Scala. This makes basic knowledge of the JVM quite important when it comes to deploying and operating Big Data technologies.
In my last blog post I explained why writing design documents is such a powerful approach to building well-engineered systems. But what should one document? When it comes to software, if one documents too much, the content of the documentation can become inaccurate very quickly, and inaccurate documentation is quickly ignored.
Many software engineers never write design documents. Design documentation takes time, and implementations often proceed so far without any documentation that if it happens, it’s an act of recording what has been done — a tedious task at the best times. Many software engineers argue “the code exists, it’s running, it’s working, let’s move on and build the next thing.”