You are probably anxious to roll up your sleeves and start hacking actual code, now that we have a development environment set up. In this chapter, you will learn the basics of developing data science applications using Metaflow, a framework that shows how different layers of the infrastructure stack can work together seamlessly.
The development environment, which we discussed in the previous chapter, determines how the data scientist develops applications: by writing code in an editor, evaluating it in a terminal, and analyzing results in a notebook. On top of this toolchain, the data scientist uses Metaflow to determine what code gets written and why, which is the topic of this chapter. The next chapters will then cover the infrastructure that determines where and when the workflows are executed.
We will introduce Metaflow from the ground up. You will first learn the syntax and the basic concepts that allow you to define basic workflows in Metaflow. After this, we will introduce branches in workflows. Branches are a straightforward way to embed concurrency in workflows, which often leads to higher performance through parallel computation.