chapter ten

10 Large Language Models in the real world

This chapter covers

Understanding how conversational LLMs work
Recognizing errors, misinformation, and biases in LLM output
Fine-tuning LLMs on your own data
Finding meaningful search results for your queries (semantic search)
Speeding up your vector search with Approximate Nearest Neighbor Algorithm
Generating fact-based well-formed text with LLMs

If you increase the number of parameters for transformer-based language models to obscene sizes, you can achieve some surprisingly impressive results. Researchers call these surprises emergent properties but they may be a mirage.^[368] Since the general public started to become aware of the capabilities of really large transformers, they are increasingly referred to as Large Language Models (LLMs). The most sensational of these surprises is that chatbots built using LLMs generate intelligent-sounding text. You’ve probably spent some time using LLM chatbots such as ChatGPT, Bard, Bing Chat, Perplexity AI, and they may be helping you get ahead in your career by helping you craft words, code, or ideas faster. Like most, you are probably relieved to finally have a search engine and virtual assistant that actually gives you direct, smart-sounding answers to your questions. This chapter will help you use LLMs smartly so you can do more than merely sound intelligent.

10.1 Large Language Models (LLMs)

10.1.1 Smarter smaller LLMs

10.1.2 Generating warm words

10.1.3 Creating your own Generative LLM

10.1.4 Fine-tuning your generative model

10.1.5 Nonsense (hallucination)

10.2 Searching for words: full-text search

10.2.1 Web-scale reverse indices

10.2.2 Improving your full-text search with trigram indices

10.3 Searching for meaning: semantic search

10.3.1 Approximate Nearest Neighbor Search

10.3.2 Choose your index

10.3.3 Quantizing the math

10.3.4 Pulling it all together with `haystack`

10.3.5 Getting real