references
                    References
Chapter 1
- Young, B. (2023). AI expert speculates on GPT-4 architecture. Weights & Biases. https://api.wandb.ai/links/byyoung3/8zxbl12q
 - Micikevicius, P. (2017). Mixed-precision training of deep neural networks. NVIDIA Developer. https://mng.bz/6eaA
 - Accelerate AI development with Google Cloud TPUs. https://cloud.google.com/tpu
 - Metz, C. (2023, July 23). Researchers poke holes in safety controls of ChatGPT and other chatbots. New York Times.
 - Hu, K. (2023, February 2). ChatGPT sets record for fastest-growing user base—analyst note. Reuters. https://mng.bz/XxKv
 
Chapter 2
- Friederici, A. D. (2011). The brain basis of language processing: From structure to function. Physiology Review, 91, 1357-1392. https://doi.org/10.1152/physrev.00006.2011
 - Nation, P., and Waring, R. (1997). Vocabulary size, text coverage, and word lists. In: N. Schmitt and M. McCarthy, eds., Vocabulary: Description, Acquisition, and Pedagogy (pp. 6-19). Cambridge University Press.
 - Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. https://arxiv.org/abs/2005.14165
 - Google/SentencePiece. https://github.com/google/sentencepiece
 - Petrov, A., La Malfa, E., Torr, P. H. S., and Bibi, A. (2023). Language model tokenizers introduce unfairness between languages. https://arxiv.org/abs/2305.15425
 
Chapter 3
- Denk, T. (2019). Linear relationships in the transformer’s positional encoding. https://mng.bz/oKxd
 - Raff, E. (2022). Inside Deep Learning. Manning.