references

references

 

Chapter 1

[1] Vaswani, Ashish, et al. (2017). Attention is all you need. arXiv. http://arxiv.org/abs/1706.03762.

Chapter 2

[1] Vaswani, Ashish, et al. (2017). Attention is all you need. arXiv. http://arxiv.org/abs/1706.03762

Chapter 3

[1] Devlin, Jacob, et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://arxiv.org/abs/1810.04805.

[2] Liu, Yinhan, et al. (2019). RoBERTa: A robustly optimized Bert pretraining approach. arXiv. https://arxiv.org/abs/1907.11692.

[3] Dosovitskiy, Alexey, et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. https://arxiv.org/abs/2010.11929.

[4] Warner, Benjamin, et al. (2024). Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and inference. arXiv. https://arxiv.org/abs/2412.13663.

Chapter 4

[1] Wei, Jason, et al. (2023). Chain-of-thought prompting elicits reasoning in large language models. Version 6. arXiv. https://arxiv.org/abs/2201.11903.

[2] Chia, Yew Ken, et al. (2023). Contrastive chain-of-thought prompting. arXiv.https://arxiv.org/abs/2311.09277.

[3] Dhuliawala, Shehzaad, et al. (2023). Chain-of-verification reduces hallucination in large language models. Version 2. arXiv.https://arxiv.org/abs/2309.11495.

[4] Yao, Shunyu, et al. (2023). Tree of thoughts: Deliberate problem solving with large language models. Version 2. arXiv. https://arxiv.org/abs/2305.10601.

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10