Cassandra is an open-source, distributed database, informally known as a NoSQL database. It is designed to store large amounts of data, offer high-write performance, and provide fault-tolerance. I recently needed some hands-on experience with Cassandra, and being relatively new to Java programming, needed a simple set-up with which I would experiment.
Setup
To make it easy to restart my work at anytime, I decided to use an
AWS EC2 Instance. This would allow me to terminate the instance and start again if needed. It also meant I’d develop a pretty complete picture of the requirements for Cassandra. Using my own machine, with all the extra packages installed, wouldn’t be a clean setup.
All work was carried out on an m1.large EC2 Instance, launched using the AMI ami-35c31a5c. After this Instance booted, I needed to install some tools:
sudo apt-get update
sudo apt-get install maven2
sudo apt-get install git
sudo apt-get install openjdk-6-jre-headless
sudo apt-get install openjdk-6-jdk
Astyanax
I decided to use
Astyanax, the Cassandra client framework recently open-sourced by
Netflix. It comes with some example code, which I would aim to follow. I started by cloning the Astyanax repo in Github:
git clone git://github.com/Netflix/astyanax.git
Even though I am new to Java, it didn’t take long for me to realise I was missing a load of other JAR files. Not wanting to set up
Maven and
Eclipse, I decided to see if there was another way. Fortunately, there was. Executing the following command:
mvn clean install
resulted in Maven fetching all the required JARs files — and it also output the URL where each could be downloaded. I piped all this output to a file, and edited it so it just had the URLs. I copied this file off, prepended ‘wget’ to each each file, and terminated the Instance. I did remove some URLs that were obviously not needed for a simple set up. You can find the final version
here.
Cassandra
I now had almost everything I needed — everything except Cassandra. I started a new EC2 Instance (don’t forget to re-install the tools listed above!), and downloaded and decompressed version 1.1.2 using these commands:
cd $HOME
wget http://mirror.cc.columbia.edu/pub/software/apache/cassandra/1.1.2/apache-cassandra-1.1.2-bin.tar.gz
tar xvfz apache-cassandra-1.1.2-bin.tar.gz
I started it by executing this command:
cd $HOME/apache-cassandra-1.1.2/bin
sudo ./cassandra
This runs in the foreground, so I logged in a second-time to the EC2 Instance, and created the keyspace and column family using the Cassandra
CLI:
$HOME/apache-cassandra-1.1.2/bin/cassandra-cli
create keyspace MyKeyspace;
use MyKeyspace;
create column family User with comparator = UTF8Type;
assume User keys as utf8;
Putting it all together
Once Cassandra was up and running, I re-ran the script to download the required JARs, which dumped them in $HOME/jars. Then I placed
my test Astyanax client program in $HOME. Finally I compiled and executed the program as follows:
cd $HOME
javac -classpath "$HOME/jars/*:." sample.java
java -classpath "$HOME/jars/*:." sample
You should then see something like the following:
log4j:WARN No appenders could be found for logger (com.netflix.astyanax.connectionpool.impl.ConnectionPoolMBeanManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
It took 2402 milliseconds
41631 events per second.
You can verify the data is in Cassandra by again using the CLI:
$HOME/apache-cassandra-1.1.2/bin/cassandra-cli
use MyKeyspace;
list User;
You can remove all data from Cassandra using the command:
drop keyspace MyKeyspace;
This allows you to re-run the test program as needed.