9 Modularity for ML: Writing testable and legible code

 

This chapter covers

  • Demonstrating why monolithic script-coding patterns make ML projects more complex
  • Understanding the complexity of troubleshooting non-abstracted code
  • Applying basic abstraction to ML projects
  • Implementing testable designs in ML code bases

Precious few emotions are more soul-crushing than those forced upon you when you’re handed a complex code base that someone else wrote. Reading through a mountain of unintelligible code after being told that you are responsible for fixing, updating, and supporting it is demoralizing. The only worse situation when inheriting a fundamentally broken code base to maintain occurs when your name is the one on the commit history.

This isn’t to say that the code doesn’t work. It may run perfectly fine. The fact that code runs isn’t the issue. It’s that a human can’t easily figure out how(or, more disastrously, why) it works. I believe this problem was most eloquently described by Martin Fowler in 2008:

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

A large portion of ML code is not aligned with good software engineering practices. With our focus on algorithms, vectors, indexers, models, loss functions, optimization solvers, hyperparameters, and performance metrics, we, as a profession of practitioners, generally don’t spend much time adhering to strict coding standards. At least, most of us don’t.

9.1 Understanding monolithic scripts and why they are bad

9.1.1 How monoliths come into being

9.1.2 Walls of text

9.1.3 Considerations for monolithic scripts

9.2 Debugging walls of text

9.3 Designing modular ML code

9.4 Using test-driven development for ML

Summary