welcome · Rearchitecting LLMs

welcome

Thank you for purchasing the MEAP of Rearchitecting LLMs. I'm glad to have you on board and look forward to your feedback in this early stage of the project.

This book is for engineers who use LLMs, whether open source or via APIs, and want to move beyond fine-tuning to optimize model architectures. If you have knowledge of PyTorch and curiosity about what happens inside a transformer, this book is for you. We'll start from concepts you already know and guide you through surgical optimization techniques currently used in research teams.

My first optimization project was for a startup detecting violent language and political analysis on the Internet. Their DistilGPT2-based model worked well, but their team needed to analyze more text in less time. Applying pruning and knowledge distillation took weeks, as the information was scattered across academic papers, many without reproducible code.

That first optimization opened up a fascinating world. I began actively seeking more optimization projects. As the industry adopted more modern open source models, the challenge grew: techniques like width pruning in GLU architectures were poorly documented, forcing me to develop my own methods combining papers like "ShortGPT" with broader approaches like "The Minitron Approach". This book exists to solve that fragmentation: unifying academic techniques into a coherent, practical pipeline.