chapter six

6 Going to production

This chapter covers

Deploying workflows to a highly scalable and highly available production scheduler
Setting up a centralized metadata service to track experiments company-wide
Defining stable execution environments with various software dependencies
Leveraging versioning to allow multiple people to develop multiple versions of a project safely

Thus far we have been starting all workflows on a personal workstation, maybe a laptop. However, it is not a good idea to run business-critical applications in a prototyping environment. The reasons are many: laptops get lost, they are hard to control and manage centrally, and, more fundamentally, the needs of rapid, human-in-the-loop prototyping are very different from the needs of production deployments.

What does “deploying to production” mean exactly? The word production is used frequently but is seldom defined precisely. Although particular use cases may have their own definitions, we recognize the following two characteristics that are common in most production deployments:

Automation—Production workflows should run without any human involvement.
High availability—Production workflows should not fail.

The main characteristic of production workflows is that they should run without a human operator: they should start, execute, and output results automatically. Note that automation doesn’t imply that they work in isolation. They can start as a result of some external event, such as new data becoming available.

6.1 Stable workflow scheduling

6.1.1 Centralized metadata

6 Going to production

This chapter covers

6.1 Stable workflow scheduling

6.1.1 Centralized metadata

6.1.2 Using AWS Step Functions with Metaflow

6.1.3 Scheduling runs with @schedule

6.2 Stable execution environments

6.2.1 How Metaflow packages flows

6.2.2 Why dependency managements matters

6.2.3 Using the @conda decorator

6.3 Stable operations

6.3.1 Namespaces during prototyping

6.3.2 Production namespaces