References

 

Chapter 1

  1. Young, B. (2023). AI expert speculates on GPT-4 architecture. Weights & Biases. https://api.wandb.ai/links/byyoung3/8zxbl12q
  2. Micikevicius, P. (2017). Mixed-precision training of deep neural networks. NVIDIA Developer. https://mng.bz/6eaA
  3. Accelerate AI development with Google Cloud TPUs. https://cloud.google.com/tpu
  4. Metz, C. (2023, July 23). Researchers poke holes in safety controls of ChatGPT and other chatbots. New York Times.
  5. Hu, K. (2023, February 2). ChatGPT sets record for fastest-growing user base—analyst note. Reuters. https://mng.bz/XxKv

Chapter 2

  1. Friederici, A. D. (2011). The brain basis of language processing: From structure to function. Physiology Review, 91, 1357-1392. https://doi.org/10.1152/physrev.00006.2011
  2. Nation, P., and Waring, R. (1997). Vocabulary size, text coverage, and word lists. In: N. Schmitt and M. McCarthy, eds., Vocabulary: Description, Acquisition, and Pedagogy (pp. 6-19). Cambridge University Press.
  3. Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. https://arxiv.org/abs/2005.14165
  4. Google/SentencePiece. https://github.com/google/sentencepiece
  5. Petrov, A., La Malfa, E., Torr, P. H. S., and Bibi, A. (2023). Language model tokenizers introduce unfairness between languages. https://arxiv.org/abs/2305.15425

Chapter 3

  1. Denk, T. (2019). Linear relationships in the transformer’s positional encoding. https://mng.bz/oKxd
  2. Raff, E. (2022). Inside Deep Learning. Manning.

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9