appendix B References and further reading
Chapter 1
Custom-built LLMs are able to outperform general-purpose LLMs as a team at Bloomberg showed via a version of GPT pretrained on finance data from scratch. The custom LLM outperformed ChatGPT on financial tasks while maintaining good performance on general LLM benchmarks:
- “BloombergGPT: A Large Language Model for Finance” (2023) by Wu et al., https://arxiv.org/abs/2303.17564
Existing LLMs can be adapted and fine-tuned to outperform general LLMs as well, which teams from Google Research and Google DeepMind showed in a medical context:
- “Towards Expert-Level Medical Question Answering with Large Language Models” (2023) by Singhal et al., https://arxiv.org/abs/2305.09617
The following paper proposed the original transformer architecture:
- “Attention Is All You Need” (2017) by Vaswani et al., https://arxiv.org/abs/1706.03762
On the original encoder-style transformer, called BERT, see
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (2018) by Devlin et al., https://arxiv.org/abs/1810.04805
The paper describing the decoder-style GPT-3 model, which inspired modern LLMs and will be used as a template for implementing an LLM from scratch in this book, is
- “Language Models are Few-Shot Learners” (2020) by Brown et al., https://arxiv.org/abs/2005.14165
The following covers the original vision transformer for classifying images, which illustrates that transformer architectures are not only restricted to text inputs: