chapter one

1 Introduction

 

Machine Learning and Artificial Intelligence were born in academia in the 1950s. Applied statistics, the precursor of the modern Data Science, has even a longer history. For decades, these techniques have been used in a myriad of business applications from financial forecasting to chemical engineering.

In the past when ML and data science were considered as advanced techniques used by PhD-level scientists in specialized applications, there was no expectation that infrastructure would exist. Building specialized applications required a special level of effort, knowledge, and patience. Today, the world is a different place. You don’t need a PhD to develop a jaw-dropping computer vision demo or a robust model for predicting sales. It is reasonable to expect that integrating such models in the surrounding business shouldn’t require a PhD in systems engineering.

1.1 Why Data Science Infrastructure

1.1.1 Lifecycle of a Data Science Project

1.2 What is Data Science Infrastructure

1.2.1 The Infrastructure Stack for Data Science

1.2.2 Taming Complexity

1.2.3 Leveraging Existing Platforms

1.3 Human-Centric Infrastructure

1.3.1 Data scientist autonomy

1.4 Summary