4 Text generation strategies and prompting techniques
Text generation lies at the core of large language model (LLM) applications, from chatbots to story generation and beyond. The quality of generated output depends not only on the model architecture but also on how we guide its predictions through decoding and sampling strategies as well as on prompting techniques.
In this chapter, we’ll explore key generation techniques: from deterministic decoding like greedy and beam search to probabilistic methods such as top-k, top-p, and temperature sampling. We’ll then turn to prompting strategies, showing how zero-shot, few-shot, and more advanced techniques like chain-of-thought (CoT) and tree-of-thought (ToT) prompting enhance reasoning and task performance.
4.1 Decoding and sampling methods for text generation
To produce their human-like text, modern transformer models rely on a diverse set of methods. Two foundational methods are decoding and sampling. Decoding refers to the process of generating an output sequence, such as a translated sentence or a continuation of text, based on an input sequence. Sampling is the process of selecting the next word (or token) in a sequence during text generation. We’ll start by considering two decoding methods: greedy search and beam search. Then we’ll look at three common approaches to sampling: top-k sampling, nucleus sampling, and temperature sampling.