Iain Schmitt


Reinventing the wheel to go back in time

Back in the 90s, you had to deal with any number of problems that are foreign to the software engineers of today: the idea that a bug in the Linux TCP/IP libraries could explain some undesired behaviour in your code would have been far more credible in 1994 than it is in 2024: in the early 90s, ICMP ping packets larger than the maximum IPv4 packet size - 'ping of death' packets - would routinely crash IP network connected devices. While it may be a common refrain that the quality of consumer facing software has worsened over time, the engineers of 1994 did not have access to PostgreSQL, the Apache Web Server, or a Linux kernel that could operate on more than one thread. 1 For open source projects as widely used as any of the three previously mentioned, the low-hanging fruit of common bugs gets picked early, with the medium and higher-hanging fruit not far behind. Today if you wanted to find a novel bug in the Linux TCP/IP libraries, you would have your work cut out for you, and it would probably require some pretty interesting torture testing. But if you did, you'd have something to hang your hat on. The barriers to entry for becoming a kernel contributed aren't zero - you have to be a skilled systems programmer and know a lot about operating systems, but that's a lot closer to what an economist would describe a perfectly competitive marketplace for talent than most examples you'll come across. While I don't understand PostgreSQL and Apache development as well as Linux, I'd be surprised if this would look any different for these projects.

Linux, PostgreSQL, and Apache have given us a great alternative to re-inventing the wheel: relying on the decades of development and bug fixing that have made those tools modern day miracles. The CPU running your Linux public cloud workloads cycles through hundreds of concurrently running processes without skipping a beat. Years of work has gone into making the interplay between page stealing and the write-ahead log of PostgreSQL work correctly on the databases of planet-scale applications. Apache (or, for that matter, Nginx) allows for you to practically take it as a given that you are receiving proper HTTP requests. Any novice who starts tinkering with operating systems, databases, or web servers gains an appreciation for how anyone making a living in computing depends on the efforts of an invertible army of engineers who cared about their craft and put decades of work and expertise into making modern miracles. If you think that you should roll your own operating system, database, or HTTP server for use in production then you are almost certainly wrong.

A generation of software engineers got their start in programming, in part, to people asking the wrong question when an exception is thrown, not to single out one StackOverflow contributor in particular. Learning programming in the internet age is a process of gradually accepting that it is nearly always your code that is wrong, at which point you check StackOverflow, see a post from someone who both saw and misinterpreted the same error message as you, then read a hopefully not too condescending answer. Almost all the tools you need are in front of you, and you simply need to use them correctly: it's never a fun exercise to figure out just how little of 'your' code is really your code between operating system libraries, managed runtimes, and third party libraries.

The farther you go in software engineering, the more likely it is that you'll face problems that the software engineers of the 90's and 00's would recognise. I could be misremembering some specifics, but at a previous company that ran a considerable amount of their compute in a private cloud I recall several weeks tracing intermittent bad gateway errors requiring several packet captures to determine that the root cause was a hardware failure on a network switch. As another example, take the Cloudflare engineers who figured out that in high volume egress workloads, TCP connections from odd-numbered ports would have higher latency than even-numbered ports. 2 In these situations "everything else is fine, I just made an error in my program" stops becoming a useful heuristic.

To better prepare for situations like these, it is a good idea to get some experience re-inventing the wheel. By building, breaking, and fixing toy examples of the technologies you use in production, you are re-enacting part of the gradual process that made these technologies as good as they are today. By understanding how the rough edges of the tools were sanded off, you will be more prepared when an HTTP server, database, or operating system behaves in an unintuitive way. When running your Worldle clone on a database you wrote yourself you can't take a shortcut and assume the database is infallible; you'll have to troubleshoot it alongside your application. It's of course a terrible idea to subject your company or customers to erroneous 403 responses because of a bug in your nftables clone, but for your hobby project? You'll want to make sure your servers are otherwise hardened, but fixing your broken software will teach you more than what you'll learn by reading and following tutorials. 3


  1. Michael Johnson. 1996. "Linux Version 2.0". Note: Symmetric multi-processing was not supported until the 2.0 version of the kernel released in 1996↩︎

  2. Frederick Lawler. 2023. connect(), why you so slow?!. In Linux Plumbers Conference 2023, November 13th-15th, 2023, Richmond, Virginia.↩︎

  3. RHEL 9 Configuring firewalls and packet filters: Chapter 2. Getting started with nftables↩︎

Back to home