references

 

Chapter 1

[1] Launchbury, John. 2020. “A DARPA Perspective on Artificial Intelligence.” https://www.youtube.com/watch?v=-O01G3tSYpU&ab_channel=DARPAtv.

[2] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), 3320–3328. MIT Press. https://arxiv.org/abs/1411.1792.

[3] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, et al. 2017. “Attention Is All You Need.” In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), 6000–6010. Curran Associates Inc. https://arxiv.org/abs/1706.03762.

[4] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” https://arxiv.org/abs/1810.04805.

[5] Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” https://api.semanticscholar.org/CorpusID:160025533.

[6] OpenAI. 2023. GPT-4 Technical Report. https://arxiv.org/abs/2303.08774.

[7] Rae, Jack W., Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al. 2021. “Scaling Language Models: Methods, Analysis & Insights from Training Gopher.” arXiv preprint arXiv:2112.11446. https://arxiv.org/abs/2112.11446.

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 7

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Appendix A

Appendix B

Appendix C