Appendix E Parameter-efficient Finetuning with LoRA
This appendix introduces low-rank adaptation (LoRA), one of the most widely used techniques for parameter-efficient finetuning. After explaining the main idea behind LoRA, this appendix will be based on the spam classification-finetuning example from chapter 6 and finetune the LLM. It's important to note, however, that LoRA finetuning is also applicable to the supervised instruction-finetuning discussed in chapter 7.
E.1 Introduction to LoRA
LoRA, or low-rank adaptation, is a technique that adapts a pretrained model to better suit a specific, often smaller, dataset by adjusting only a small subset of the model's weight parameters. The "low-rank" aspect refers to the mathematical concept of limiting model adjustments to a smaller dimensional subspace of the total weight parameter space, which effectively captures the most influential directions of the weight parameter changes during training.
The LoRA method is useful and popular because it enables efficient finetuning of large models on task-specific data, significantly cutting down on the computational costs and resources that are usually required for finetuning.
To explain how LoRA works, suppose there is a large weight matrix W associated with a specific layer. LoRA can be applied to all linear layers in an LLM, as we will see later, but we focus on a single layer for illustration purposes in this section.