appendix-c
Appendix C. Knowledge Distillation: Shrinking Models for Efficient, Hierarchical Molecular Generation
This chapter covers
- The Hierarchical Variational Autoencoder (HierVAE) for generating molecules by assembling chemically valid substructures.
- Core concepts of knowledge distillation, showing how a compact "student" model can learn from a larger "teacher" model.
- How to apply knowledge distillation to compress a large, pre-trained HierVAE model into a smaller, faster version.
- A complete implementation pipeline, including student model design, a multi-component loss function, and training strategies.
- Key metrics like generation speed, model size, validity, and uniqueness to analyze trade-offs.
“Given a pre-existing model, we can rebuild it. We have the technology. We can make it smaller than it ever was. Smaller, cheaper, faster!”
--- The Six Million Dollar Man (paraphrased)