appendix-b

Appendix B. References and Further Reading

Chapter 1

Custom-built LLMs are able to outperform general-purpose LLMs as a team at Bloomberg showed via a version of GPT pretrained on finance data from scratch. The custom LLM outperformed ChatGPT on financial tasks while maintaining good performance on general LLM benchmarks:

BloombergGPT: A Large Language Model for Finance (2023) by Wu et al., https://arxiv.org/abs/2303.17564

Existing LLMs can be adapted and finetuned to outperform general LLMs as well, which teams from Google Research and Google DeepMind showed in a medical context:

Towards Expert-Level Medical Question Answering with Large Language Models (2023) by Singhal et al., https://arxiv.org/abs/2305.09617

The paper that proposed the original transformer architecture:

Attention Is All You Need (2017) by Vaswani et al., https://arxiv.org/abs/1706.03762

The original encoder-style transformer, called BERT:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) by Devlin et al., https://arxiv.org/abs/1810.04805

The paper describing the decoder-style GPT-3 model, which inspired modern LLMs and will be used as a template for implementing an LLM from scratch in this book:

Language Models are Few-Shot Learners (2020) by Brown et al., https://arxiv.org/abs/2005.14165

The original vision transformer for classifying images, which illustrates that transformer architectures are not only restricted to text inputs:

Appendix B. References and Further Reading

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7