There is a popular image out there, among the general public, that small startups — particularly small software startups — are a hotbed of technical innovation, constantly creating new technology. But is it true?
Continue reading Where is technical innovation actually happening? →
Today sees the launch of Analytics 2.0 on the Percolate platform. After 12 months of hard work by my team, I am very proud of the new platform.
1 year ago the San Francisco team was tasked with rebuilding the Analytics system at Percolate. In place of our legacy MySQL-based system, we now have a brand new architecture, based on Apache Kafka and Elasticsearch. It’s more responsive, more flexible, and offers much richer functionality.
You can learn all about the new system on the Percolate blog.
It’s been 18 months since the first commit to my first significant Go project — syslog-gollector. After an initial burst of activity to create a functional Syslog Collector that streamed to Apache Kafka, the source code hadn’t been updated much since. But today I received a report that it no longer built, so I spent some time porting the code to the latest Shopify Sarama framework.
It was amusing to see how naive much of my early Go code was.
Continue reading Revisiting syslog-gollector →
Real-time — or near real-time — data pipelines are all the rage these days. I’ve built one myself, and they are becoming key components of many SaaS platforms. SaaS Analytics, Operations, and Business Intelligence systems often involve moving large amounts of data, received over the public Internet, into complex backend systems. And managing the incoming flow of data to these pipelines is key.
Continue reading Drop, Throttle, or Buffer →
I’ve started coding in Go (golang), and I received some advice recently from Robert Griesemer, whom I was fortunate enough to sit beside at a recent Go Meetup. To learn Go, Robert suggested that I code a solution in Go for a problem I had previously solved in a different language.
Continue reading Writing a Syslog Collector in Go →
The creator of the network monitoring system Riemann, Kyle Kingsbury, has put together a comprehensive series of blog posts, on the fault-tolerance, high-availability, and general correctness of number of database and storage technologies. Of the technologies discussed I am most familiar with — elasticsearch and Apache Kafka — I found the posts to be a great read.
If you haven’t read them yet, you should check them out on his site.
AWS have posted the video online of Jim Nisbet’s and my talk at AWS:reinvent 2013. In it, Jim and I describe the system we built at Loggly, which uses Apache Kafka, Twitter Storm, and elasticseach, to build a high-performance log aggregation and analytics SaaS solution, running on AWS EC2.
Continue reading Infrastructure at Scale: Apache Kafka, Twitter Storm and elasticsearch →
When running a large real-time processing system, monitoring is critical. But it does more than allow you to keep an eye on your system. During development it allows you test hypotheses about how it works, how it performs when certain parameters are changed, and takes the guessing out of working with dynamic systems.
Storm, a real-time computational framework open-sourced by Twitter, is such a system and comes with a Spout, allowing messages to be streamed from a Kafka Broker.
Continue reading Monitoring Storm Kafka Spouts using Python →