1 Software engineering principles
This chapter covers
- What data scientists need to know about software engineering
- Components of a data pipeline
- Deploying models with machine learning pipelines
Suppose you’re collaborating on a software project with a team that includes data scientists, software engineers, and other technical and non-technical roles. How do you handle modifying the same code files? What about testing out new features or modeling techniques? What’s the best way to track these experiments or to revert changes? Software engineers and data scientists may have very different answers to these questions.
Data scientists frequently use tools like Jupyter Notebook, which allows you to write code and view its results in a single integrated environment. Jupyter Notebooks are easy to create, use, and share, in particular because they allow you to show charts or other visuals alongside the related code, data, and text descriptions. This is generally because part of being a data scientist is experimentation and exploration - trying out various ideas, creating visualizations, and searching for answers in data. However, as projects become more complex and involve more contributors, these notebook files can get messy very quickly.