chapter two

2 An end-to-end architectural tailoring project

 

This chapter covers

  • Establishing baseline capabilities and inference metrics
  • Applying depth pruning to remove layers
  • Measuring structural modification impacts
  • Recovering knowledge through distillation
  • Creating smaller, faster models

In the previous chapter, we saw the model tailoring pipeline, where we explained that the most differentiating point of our approach to creating efficient solutions was the re-architecture of the models.

In this chapter, you're going to get hands-on and create an adaptation of a model. We're going to treat it as if it were a professional assignment, in which we have a base model that works very well, and we need to reduce its size while preserving as many of its capabilities as possible.

NOTE

This is a very common case in the industry. For example, Nvidia uses the combination of Structural Optimization (through pruning techniques) and Knowledge Recovery (using knowledge distillation) to create its model families. This way, they only need to fully train the largest model in the family.

2.1 The rearchitecting workflow

2.2 Establishing the baseline

2.3 Applying depth pruning

2.4 Evaluating the impact of pruning

2.5 Recovering knowledge

2.6 Analyzing the final result

2.7 Hands-on lab

2.8 Summary