Software Development for Infrastructure

Bjarne Stroustrup has another very interesting paper on his website.  Titled Software Development for Infrastructure, it discusses some key ideas for building software that has “…more stringent correctness, reliability, efficiency, and maintainability requirements than non-essential applications.”  It is not a long paper, but offers useful observations and guidelines for building such software systems.

For me, some of the observations particularly stood out. I list them below, along with some of my own thoughts.

  • We compensate for lack of design-time knowledge by postponing decisions until runtime and adding runtime tests to catch any errors.

Stroustrup states that many of the tools that today make programming easier and less error-prone do so by moving decisions from design-time to runtime, which is becoming more-and-more costly in terms of hardware. I think dynamic typing might fall into this category.

  • We must look at high-reliability systems as our model, rather than the techniques used to produce “Internet-time Web applications”.

The point particularly resonated with me. There have been big advances in languages, tools, and frameworks over the past 10 years — it’s a golden age for programming. But many contemporary software development models result in terrible systems. This is one of the reasons I am not impressed by node.js — in my experience programming in node (and JavaScript for that matter) does not promote a particularly high standard of software engineering. I have found that languages like C, C++ (and most recently for me, Go) encourage much better engineering practices.

  • Types

A significant proportion of the paper is dedicated to this subject. Stroustrup presents the compelling example of the 1999 Mars Climate Orbiter, which was lost due to a navigation error. As many developers know, it was lost due to a mix-up in the units of speed, which resulted in the Orbiter burning up in the Martian atmosphere. Stroustrup argues that by making full use of the type system — including user-defined types — costly errors can be reduced, and code quality improved without adding runtime overhead. His arguments certainly made me consider that functions that take plain ol’ integers as arguments can be a significant source of errors in a program.

  • Many developers equate “low-level” with “fast” out of naivete or from experience with complicated bloatware.
  • Expressing code in terms of algorithms rather hand-crafted, special-purpose code can to lead to more readable code that’s more likely to be correct, often more general, and often more efficient.

Try not to re-invent the wheel. The STL, and its associated algorithms, are often a better choice that writing something yourself.

  • Hardware improvements make the problems and costs resulting from isolating software from hardware far worse than they used to be.

According to the paper “…a single-threaded, nonvectorized, non-GPU-utilizing application has access to roughly 0.4 percent of the compute power [on a typical desktop machine]…”. This actually seems to be a point made by more and more people nowadays — that contemporary increases in computing power are parallel in nature and unless we structure our software to work in this manner, we will “…face massive waste.”

  • We need to be able to reason about code without knowledge of the complete system. Shared resources are poison to such reasoning, making local resource management essential.

This is always a challenge when building larger software systems. There must be a relentless drive towards isolation of software components if systems are to be reliable — and understandable.

In summary, an interesting paper. You can check it out here.

Leave a Reply

Your email address will not be published. Required fields are marked *