1 Introduction


This chapter covers

  • Defining data engineering
  • Anatomy of a data platform
  • Benefits of the cloud
  • Getting started with Azure
  • Overview of an Azure data platform

With the advent of cloud computing, the amount of data generated every moment reached an unprecedented scale. The discipline of data science flourishes in this environment, deriving knowledge and insights from massive amounts of data. As data science becomes critical to business, its processes must be treated with the same rigor as other components of business IT. For example, software engineering teams today embrace DevOps to develop and operate services with 99.99999% availability guarantees. Data engineering brings a similar rigor to data science, so data-centric processes run reliably, smoothly, and in a compliant way.

For the past few years, I’ve had the privilege of being a software architect for Microsoft’s Customer Growth and Analytics team. Our team’s motto is “Using Azure to understand Azure.” We connect many datapoints across the Microsoft business to better understand our customers and to empower teams across the company. Privacy is important to us, so we never look at our customers’ data, but we do have access to telemetry from Azure, commercial transactions, and other operational pipelines. This gives us a unique perspective on Azure in understanding how customers can get the most value from our offerings.

1.1 What is data engineering?

1.2 Who this book is for

1.3 What is a data platform?

1.3.1 Anatomy of a data platform

1.3.2 Infrastructure as code, codeless infrastructure

1.4 Building in the cloud

1.4.1 IaaS, PaaS, SaaS

1.4.2 Network, storage, compute

1.4.3 Getting started with Azure

1.4.4 Interacting with Azure

1.5 Implementing an Azure data platform