Bootstrapping Cassandra

cassandraCassandra is an open-source, distributed database, informally known as a NoSQL database. It is designed to store large amounts of data, offer high-write performance, and provide fault-tolerance. I recently needed some hands-on experience with Cassandra, and being relatively new to Java programming, needed a simple set-up with which I would experiment.

Setup

To make it easy to restart my work at anytime, I decided to use an AWS EC2 Instance. This would allow me to terminate the instance and start again if needed. It also meant I’d develop a pretty complete picture of the requirements for Cassandra. Using my own machine, with all the extra packages installed, wouldn’t be a clean setup.
All work was carried out on an m1.large EC2 Instance, launched using the AMI ami-35c31a5c. After this Instance booted, I needed to install some tools:

sudo apt-get update
sudo apt-get install maven2
sudo apt-get install git
sudo apt-get install openjdk-6-jre-headless
sudo apt-get install openjdk-6-jdk

Astyanax

I decided to use Astyanax, the Cassandra client framework recently open-sourced by Netflix. It comes with some example code, which I would aim to follow. I started by cloning the Astyanax repo in Github:

 
git clone git://github.com/Netflix/astyanax.git

Even though I am new to Java, it didn’t take long for me to realise I was missing a load of other JAR files. Not wanting to set up Maven and Eclipse, I decided to see if there was another way. Fortunately, there was. Executing the following command:

 
mvn clean install

resulted in Maven fetching all the required JARs files — and it also output the URL where each could be downloaded. I piped all this output to a file, and edited it so it just had the URLs. I copied this file off, prepended ‘wget’ to each each file, and terminated the Instance. I did remove some URLs that were obviously not needed for a simple set up. You can find the final version here.

 

Cassandra

I now had almost everything I needed — everything except Cassandra. I started a new EC2 Instance (don’t forget to re-install the tools listed above!), and downloaded and decompressed version 1.1.2 using these commands:

 
cd $HOME
wget http://mirror.cc.columbia.edu/pub/software/apache/cassandra/1.1.2/apache-cassandra-1.1.2-bin.tar.gz
tar xvfz apache-cassandra-1.1.2-bin.tar.gz

I started it by executing this command:

 
cd $HOME/apache-cassandra-1.1.2/bin
sudo ./cassandra

This runs in the foreground, so I logged in a second-time to the EC2 Instance, and created the keyspace and column family using the Cassandra CLI:

 
$HOME/apache-cassandra-1.1.2/bin/cassandra-cli
create keyspace MyKeyspace;
use MyKeyspace;
create column family User with comparator = UTF8Type;
assume User keys as utf8;

Putting it all together

Once Cassandra was up and running, I re-ran the script to download the required JARs, which dumped them in $HOME/jars. Then I placed my test Astyanax client program in $HOME. Finally I compiled and executed the program as follows:

 

cd $HOME
javac -classpath "$HOME/jars/*:." sample.java
java -classpath "$HOME/jars/*:." sample

You should then see something like the following:

 
log4j:WARN No appenders could be found for logger (com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
It took 2402 milliseconds
41631 events per second.

You can verify the data is in Cassandra by again using the CLI:

 
$HOME/apache-cassandra-1.1.2/bin/cassandra-cli
use MyKeyspace;
list User;

You can remove all data from Cassandra using the command:
drop keyspace MyKeyspace;
This allows you to re-run the test program as needed.