references
Chapter 1
[1] Launchbury, John. 2020. “A DARPA Perspective on Artificial Intelligence.” https://www.youtube.com/watch?v=-O01G3tSYpU&ab_channel=DARPAtv.
[2] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. “How Transferable Are Features in Deep Neural Networks?” In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), 3320–3328. MIT Press. https://arxiv.org/abs/1411.1792.
[3] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, et al. 2017. “Attention Is All You Need.” In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), 6000–6010. Curran Associates Inc. https://arxiv.org/abs/1706.03762.
[4] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” https://arxiv.org/abs/1810.04805.
[5] Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners.” https://api.semanticscholar.org/CorpusID:160025533.
[6] OpenAI. 2023. GPT-4 Technical Report. https://arxiv.org/abs/2303.08774.
[7] Rae, Jack W., Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al. 2021. “Scaling Language Models: Methods, Analysis & Insights from Training Gopher.” arXiv preprint arXiv:2112.11446. https://arxiv.org/abs/2112.11446.