chapter one

1 Introduction

 

Machine Learning and Artificial Intelligence were born in academia in the 1950s. Applied statistics, the precursor of the modern Data Science, has even a longer history. For decades, these techniques have been used in a myriad of business applications from financial forecasting to chemical engineering.

In the past when ML and data science were considered as advanced techniques used by PhD-level scientists in specialized applications, there was no expectation that infrastructure would exist. Building specialized applications required a special level of effort, knowledge, and patience. Today, the world is a different place. You don’t need a PhD to develop a jaw-dropping computer vision demo or a robust model for predicting sales. It is reasonable to expect that integrating such models in the surrounding business shouldn’t require a PhD in systems engineering.

1.1    Why Data Science Infrastructure

1.1.1   Lifecycle of a Data Science Project

1.2    What is Data Science Infrastructure

1.2.1   The Infrastructure Stack for Data Science

1.2.2   Taming Complexity

1.2.3   Leveraging Existing Platforms

1.3    Human-Centric Infrastructure

1.3.1   Data scientist autonomy

1.4    Summary