Designing a search system for log data — part 3

This is the last part of a 3-part series “Designing and building a search system for log data”. Be sure to check out part 1 and part 2.

ekanite-cubeIn the last post we examined the design and implementation of Ekanite, a system for indexing log data, and making that data available for search in near-real-time. Is this final post let’s see Ekanite in action.

Downloading and running

Ekanite is written in Go, which makes getting started easy. You do need Go installed, but nothing else special. To get up and running follow the instructions in the Building section of the README file.
Starting Ekanite
Assuming Ekanite has been built and is on your path, start the system like so:
ekanited -datadir $HOME/ekanite

Once launched, Ekanite listens on three TCP ports:

  • At http://localhost:9951/debug/vars it makes simple statistics and diagnostic information available.
  • On TCP port 5514 it accepts log data.
  • On TCP port 9950 it listens for queries.

All of these ports are configurable.

Index some log lines

Run the following command to have Ekanite index about 1,000 Apache access log lines, prefixed with RFC5424 headers. This command uses netcat to send data to Ekanite, simulating a real syslog client, somewhere on the network.

curl https://raw.githubusercontent.com/ekanite/ekanite/master/test_resources/logs1k.txt.bz2 | bunzip2 | netcat localhost 5514

Depending on your link, and your hardware, this can take a few seconds to complete. If you watch the Ekanite statistics you’ll see the number of parsed lines increase and then stop when all lines have been indexed.

Search

Once indexing has completed, we can run some searches.
$ telnet 127.0.0.1 9950
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
login
<134>0 2015-05-05T23:50:17.025568+00:00 fisher apache-access - - 65.98.59.154 - - [05/May/2015:23:50:12 +0000] "GET /wp-login.php HTTP/1.0" 200 206 "-" "-"
<134>0 2015-05-06T01:24:41.232890+00:00 fisher apache-access - - 104.140.83.221 - - [06/May/2015:01:24:40 +0000] "GET /wp-login.php?action=register HTTP/1.0" 200 206 "https://www.philipotoole.com/" "Opera/9.80 (Windows NT 6.2; Win64; x64) Presto/2.12.388 Version/12.17"
<134>0 2015-05-06T04:20:49.008609+00:00 fisher apache-access - - 193.104.41.186 - - [06/May/2015:04:20:46 +0000] "POST /wp-login.php HTTP/1.1" 200 206 "-" "Opera 10.00"
login -GET
<134>0 2015-05-06T04:20:49.008609+00:00 fisher apache-access - - 193.104.41.186 - - [06/May/2015:04:20:46 +0000] "POST /wp-login.php HTTP/1.1" 200 206 "-" "Opera 10.00"
In the example above the first search returns all log lines with the word “login” present. The second returns all the log lines with “login”, but also excludes any lines with “GET”. There is more detail in the README.

Enhancing Ekanite

Ekanite could be enhanced and improved in many, many ways. Major improvements would include:

  • More sophisticated query syntax would allow, for example, search on specific fields of the log data. This would involve building a parser, which would run in the query server.
  • A more sophisticated query would allow the system to accept time-bounded queries. This would allow searches of, say, the last hour. Searches would be correspondingly faster.
  • Ekanite has not been tuned for performance. Go comes with an extensive set of performance and profiling tools. Since Ekanite is a demonstration system, its main goal is functionality, but there are significant performance improvements available. For example, use a state-machine to parse the log lines, instead of regular expressions.
  • Reducing the number of memory allocations the code makes would minimize the impact of the garbage collection.
  • Ekanite uses bleve with its default storage engine, which is BoltDB. Better storage efficiency and indexing throughput may be achieved with other engines such as LevelDB. This would complicate the build process however, as LevelDB is written in C++.
  • Fix the bugs!

FIN

Indexing and search  systems are fascinating, and I encourage you to check out the bleve and Ekanite source code.

Hopefully this series of posts has been helpful in understanding the various requirements, trade-offs, design, and implementation of one particular type of these systems.

Leave a Reply

Your email address will not be published. Required fields are marked *