Sometime ago I was asked where to begin to learn data engineering. It was a broad question, and it took some to understand what exactly I was being asked.
The batching of data or computation amortizing a fixed cost over multiple units — is a very common pattern in many computers systems. It’s particularly prevalent in networking and CPU memory accesses.
But the implementation of batching includes many subtleties — in particular when to wait for more data, and when to transmit what you have.
I recently had a chance to speak about rqlite, the distributed, lightweight database built on SQLite, at the University of Pittsburgh Computer Science Club. It was a good evening as I spoke about distributed systems, the problems they solve, and how rqlite uses Raft to replicate SQLite.
You can find the presentation here.
Go remains one of the languages I’m most productive in. Its combination of the rigour of static typing, but fluidity of Python, makes it both robust and easy to code in.
It’s also got some innovative features that help you catch those tough-to-find issues, particularly when they only occur in production. An example is the Go Race Detector.
Monitoring — the measurement of your system, the gathering of telemetry, and alerting when it behaves anomalously — is key to running large-scale, modern computer systems. But what many developers today don’t realise is that monitoring can be a key part of your design cycle too.