Welcome
Thanks for purchasing the MEAP for Software Engineering for Data Scientists. This book is written for readers looking to learn how to apply software engineering concepts to data science.
The book is split into four parts:
- Part 1 – Getting started
- This part will cover topics such as source control, exception handling, better structuring your code, object-oriented programming (OOP) for data science, and monitoring the progress of your code (such as model training or data extraction)
- Part 2 – Scaling
- Part 2 covers scaling your code effectively. For example – how do you deal with larger datasets? We’ll cover both the computational and memory components of scaling
- Part 3 – Scheduling, testing, and deployment into production
- Part 3 details how to rigorously test your code, protecting your credentials (for example when connecting to a database to query data, scheduling models and data pipelines to run automatically, and packaging data analytics code into a portable library that can be shared with and downloaded by others
- Part 4 – Monitoring your data processing and modeling code
- Lastly, Part 4 will teach you how to effectively monitor your code in production. This is especially relevant when you deploy a machine learning model to make predictions on a recurring or automated basis. We’ll cover logging, automated reporting, and how to build dashboards with Python.