chapter six

6 Beyond natural language processing

This chapter covers

How transformer layers work on data other than text
Helping LLMs to write working software
Tweaking LLMs so they understand mathematical notation
How transformers replace the input and output steps to work with images

While modeling natural language was the transformers’ primary purpose, machine learning researchers quickly discovered they could predict anything involving data sequences. Transformers view a sentence as a sequence of tokens and either produce a related sequence of tokens, such as a translation from one language to another, or predict the following tokens in a sequence, such as when answering questions or acting like a chatbot. While sequence modeling and prediction are potent tools for interpreting and generating natural language, natural language is the only domain where LLMs can be helpful.

6.1 LLMs for software development

6.1.1 Improving LLMs to work with code

6.1.2 Validating code generated by LLMs

6.1.3 Improving code via formatting

6.2 LLMs for formal mathematics

6.2.1 Sanitized input

6.2.2 Helping LLMs understand numbers

6.2.3 Math LLMs also use tools

6.3 Transformers and computer vision

6.3.1 Converting images to patches and back

6.3.2 Multimodal models using images and text

6.3.3 Applicability of prior lessons

Summary