Part 1 An introduction to machine learning engineering

 

I’m sure you’ve seen, like most people in the data science field, the statistics on project failures. Based on my experience, the numbers thrown around for a project getting into production (namely, by vendors promising that their tooling stack will improve your chances if you just pay them!) are ridiculously grim. However, some element of truth exists in the hyperbolic numbers that are referenced in the rates of project failure.

Using machine learning (ML) to solve real-world problems is complex. The sheer volume of tooling, algorithms, and activities involved in building a useful model are daunting for many organizations. In my time working as a data scientist and subsequently helping many dozens of companies build useful ML projects, I’ve never seen the tooling or the algorithms be the reason a project fails to provide value to a company.

The vast majority of the time, a project that fails to make its way to production for sustained utility has issues that are rooted in the very early phases. Before even a single line of code is written, before a serving architecture is selected and built out, and long before a decision on scalable training is made, a project is doomed to either cancellation or unused obscurity if planning, scoping, and experimentation are not done properly.