chapter six

6 Postgres for full-text search

This chapter covers

Understanding how Postgres supports full-text search
Converting textual data into lexemes
Storing lexemes in the database
Performing full-text search using built-in functions and operators
Optimizing full-text search performance with GIN and GiST indexes

Postgres has supported various text data types and search operators for years. For instance, we can store text values in a column of the TEXT type and then use the equality operator (=), pattern-matching operators (LIKE and ILIKE), regular-expression-matching operators (~ and ~*), and other operators to query the text data. Although these capabilities work well for simple text data and queries, they are less effective when searching through large text documents that require understanding linguistic features like word variations or relevance ranking.

Full-text search is another core feature of Postgres that lets us perform advanced searches over textual data of varying size and complexity. Unlike basic text comparison or regex matching, it uses linguistic techniques to tokenize, normalize, rank, and index text data, making searches faster and more relevant.

6.1 Basics of full-text search in Postgres

6.1.1 Tokenization and normalization

6.1.2 Full-text search configurations

6.2 Preparing data for text search

6.2.1 Generating lexemes with the to_tsvector function

6.2.2 Storing tsvector lexemes in the database

6.3 Performing full-text search

6.3.1 Using plainto_tsquery for simple queries

6.3.2 Using to_tsquery for advanced filtering

6.4 Ranking search results

6.5 Highlighting search results

6.6 Indexing lexemes

6.6.1 Using GIN indexes

6.6.2 Using GiST indexes

Summary