Elasticsearch does a lot of ground (and grunt) work behind the scenes on incoming textual data. It preps data to make it efficiently stored and searchable. In a nutshell, Elasticsearch cleans text fields, breaks text data into individual tokens, and enriches the tokens before storing them in inverted indexes. When a search query is carried out, the query string is searched against the stored tokens, and any matches are retrieved and scored. This process of breaking the text into individual tokens and storing it in internal memory structures is called text analysis.
The aim of text analysis is not just to return search results quickly and efficiently, but also to retrieve relevant results. The work is carried out using analyzers: software components prebuilt to inspect the input text according to various rules. If the user searches for “K8s”, for example, we should be able to fetch books on Kubernetes. Similarly, if search sentences include emojis such as ☕ (coffee), the search engine should be able to extract coffee-appropriate results. These and many more search criteria are honored by the engine due to the way we configure the analyzers.