2 Source control for data scientists
This chapter covers
- What is source control
- Tool for source control (Git)
- Git workflow from scratch
- Handling conflicts and merges with Git
- Comparing Jupyter Notebook files with nbdime
In the last chapter, we introduced several key software engineering concepts that will improve your life as a data scientist. One of these key concepts is source control, which we’re going to focus on for this chapter. Source control (also called version control) is basically a way of tracking changes made to a codebase. As the number and size of codebases has grown significantly over the years, the need for monitoring code changes and making it easier for various developers to collaborate is crucial. Because software engineering has existed longer than modern data science, source control has been a software engineering practice longer than a data science one. However, as we’ll demonstrate in this chapter, source control is an important tool to learn for any data scientist.