chapter two

2 An end-to-end architectural tailoring project

This chapter covers

Establishing baseline capabilities and inference metrics
Applying depth pruning to remove layers
Measuring structural modification impacts
Recovering knowledge through distillation
Creating smaller, faster models

In the previous chapter, we saw the model tailoring pipeline, where we explained that the most differentiating point of our approach to creating efficient solutions was the re-architecture of the models.

In this chapter, you're going to get hands-on and create an adaptation of a model. We're going to treat it as if it were a professional assignment, in which we have a base model that works very well, and we need to reduce its size while preserving as many of its capabilities as possible.

NOTE

This is a very common case in the industry. For example, Nvidia uses the combination of Structural Optimization (through pruning techniques) and Knowledge Recovery (using knowledge distillation) to create its model families. This way, they only need to fully train the largest model in the family.

2 An end-to-end architectural tailoring project

This chapter covers

NOTE

2.1 The rearchitecting workflow

2.2 Establishing the baseline

2.3 Applying depth pruning

2.4 Evaluating the impact of pruning

2.5 Recovering knowledge

2.6 Analyzing the final result

2.7 Hands-on lab

2.8 Summary