chapter four

4 Text generation strategies and prompting techniques

 

This chapter covers

  • Decoding methods
  • Sampling methods
  • Prompting techniques
  • Advanced prompting

Text generation lies at the core of large language model (LLM) applications, from chatbots to story generation and beyond. The quality of generated output depends not only on the model architecture but also on how we guide its predictions through decoding and sampling strategies as well as on prompting techniques.

In this chapter, we’ll explore key generation techniques: from deterministic decoding like greedy and beam search to probabilistic methods such as top-k, top-p, and temperature sampling. We’ll then turn to prompting strategies, showing how zero-shot, few-shot, and more advanced techniques like chain-of-thought (CoT) and tree-of-thought (ToT) prompting enhance reasoning and task performance.

4.1 Decoding and sampling methods for text generation

To produce their human-like text, modern transformer models rely on a diverse set of methods. Two foundational methods are decoding and sampling. Decoding refers to the process of generating an output sequence, such as a translated sentence or a continuation of text, based on an input sequence. Sampling is the process of selecting the next word (or token) in a sequence during text generation. We’ll start by considering two decoding methods: greedy search and beam search. Then we’ll look at three common approaches to sampling: top-k sampling, nucleus sampling, and temperature sampling.

4.1.1 Greedy search decoding for text generation

4.1.2 Beam search decoding for text generation

4.1.3 Top-k sampling for text generation

4.1.4 Nucleus sampling for text generation

4.1.5 Temperature sampling for text generation

4.2 The art of prompting

4.2.1 Zero-shot prompting

4.2.2 One- and few-shot prompting

4.2.3 CoT prompting

4.2.4 Structured CoT with Instructor

4.2.5 Contrastive CoT prompting

4.2.6 CoVe prompting

4.2.7 ToT prompting

4.2.8 ThoT prompting