references

References

Chapter 1

Young, B. (2023). AI expert speculates on GPT-4 architecture. Weights & Biases. https://api.wandb.ai/links/byyoung3/8zxbl12q
Micikevicius, P. (2017). Mixed-precision training of deep neural networks. NVIDIA Developer. https://mng.bz/6eaA
Accelerate AI development with Google Cloud TPUs. https://cloud.google.com/tpu
Metz, C. (2023, July 23). Researchers poke holes in safety controls of ChatGPT and other chatbots. New York Times.
Hu, K. (2023, February 2). ChatGPT sets record for fastest-growing user base—analyst note. Reuters. https://mng.bz/XxKv

Friederici, A. D. (2011). The brain basis of language processing: From structure to function. Physiology Review, 91, 1357-1392. https://doi.org/10.1152/physrev .00006.2011
Nation, P., and Waring, R. (1997). Vocabulary size, text coverage, and word lists. In: N. Schmitt and M. McCarthy, eds., Vocabulary: Description, Acquisition, and Pedagogy (pp. 6-19). Cambridge University Press.
Brown, T. B., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. https://arxiv.org/abs/2005.14165
Google/SentencePiece. https://github.com/google/sentencepiece
Petrov, A., La Malfa, E., Torr, P. H. S., and Bibi, A. (2023). Language model tokenizers introduce unfairness between languages. https://arxiv.org/abs/2305.15425

Denk, T. (2019). Linear relationships in the transformer’s positional encoding. https://mng.bz/oKxd
Raff, E. (2022). Inside Deep Learning. Manning.