Appendix B. References and Further Reading
Chapter 1
Custom-built LLMs are able to outperform general-purpose LLMs as a team at Bloomberg showed via a version of GPT pretrained on finance data from scratch. The custom LLM outperformed ChatGPT on financial tasks while maintaining good performance on general LLM benchmarks:
- BloombergGPT: A Large Language Model for Finance (2023) by Wu et al., https://arxiv.org/abs/2303.17564
Existing LLMs can be adapted and finetuned to outperform general LLMs as well, which teams from Google Research and Google DeepMind showed in a medical context:
- Towards Expert-Level Medical Question Answering with Large Language Models (2023) by Singhal et al., https://arxiv.org/abs/2305.09617
The paper that proposed the original transformer architecture:
- Attention Is All You Need (2017) by Vaswani et al., https://arxiv.org/abs/1706.03762
The original encoder-style transformer, called BERT:
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) by Devlin et al., https://arxiv.org/abs/1810.04805
The paper describing the decoder-style GPT-3 model, which inspired modern LLMs and will be used as a template for implementing an LLM from scratch in this book:
- Language Models are Few-Shot Learners (2020) by Brown et al., https://arxiv.org/abs/2005.14165
The original vision transformer for classifying images, which illustrates that transformer architectures are not only restricted to text inputs: