1 Introducing data science infrastructure


This chapter covers

  • Why companies need data science infrastructure in the first place
  • Introducing the infrastructure stack for data science and machine learning
  • Elements of successful data science infrastructure

Machine learning and artificial intelligence were born in academia in the 1950s. Technically, everything presented in this book has been possible to implement for decades, if time and cost were not a concern. However, for the past seven decades, nothing in this problem domain has been easy.

As many companies have experienced, building applications powered by machine learning has required large teams of engineers with specialized knowledge, often working for years to deliver a well-tuned solution. If you look back on the history of computing, most society-wide shifts have happened not when impossible things have become possible but when possible things have become easy. Bridging the gap between possible and easy requires effective infrastructure, which is the topic of this book.

1.1 Why data science infrastructure?

1.1.1 The life cycle of a data science project

1.2 What is data science infrastructure?

1.2.1 The infrastructure stack for data science

1.2.2 Supporting the full life cycle of a data science project

1.2.3 One size doesn’t fit all

1.3 Why good infrastructure matters

1.3.1 Managing complexity

1.3.2 Leveraging existing platforms

1.4 Human-centric infrastructure

1.4.1 Freedom and responsibility

1.4.2 Data scientist autonomy