1 Meeting Postgres

MEAP v3

This chapter covers

Understanding what is Postgres, why it’s so popular these days, and the meaning of “Just Use Postgres”
Starting Postgres on your own laptop and connecting from a command-line tool.
Generating mock data using built-in database capabilities and running a few basic queries

If you’re holding this book, then at least you’ve heard about PostgreSQL (or simply Postgres, as most call it). Whether you’re a student who completed a general-purpose database course or an experienced application developer who used Postgres or another database on one of the projects, this book will open up the breadth and depths of Postgres capabilities that go far beyond what relational databases usually offer.

In this chapter, you start by learning why Postgres became so popular among developers and what “Just Use Postgres” means. Then, you’ll explore how to get started with Postgres on your own laptop and run the first experiments in under ten minutes. Finally, you’ll use built-in database capabilities to generate mock data and run a few basic SQL queries that most developers have in their application logic.

1.1 What is Postgres?

Postgres is one of the most well-known and fastest-growing relational databases. As a relational database, it natively supports SQL (structured query language) and is usually selected by developers as a backbone for transactional applications.

Transactional applications need to process user requests quickly without compromising data consistency. You deal with those applications while booking a flight to your next vacation destination, paying for groceries at a local supermarket, or sharing a new post on your favorite social media platform. Developers in charge of those applications use Postgres because it can process a high volume of hotel reservations, payments, and social media posts with low latency, guaranteeing data consistency even if a database server crashes during a business operation execution.

This type of application is among the main Postgres use cases, which makes it comparable to other relational databases. What makes it different, though, is its extendibility.

Over the years, Postgres evolved into a general-purpose database that supports use cases beyond traditional transactional workloads. Postgres has a broad ecosystem of extensions and other derived solutions that have turned it into a general-purpose database capable of handling time-series, generative AI, geospatial, analytical, and other workloads.

1.2 Why is Postgres so popular?

There are hundreds of databases nowadays. As of July 2024, the DB-Engines website ranked and categorized over 420 database management systems, with Postgres ranked as the fourth most popular option.

The DB-Engines website uses various metrics to calculate the scores of ranked databases. It measures the number of mentions of a database on websites across the Internet and social networks, counts the number of discussions on Stack Overflow and DBA Stack Exchange, and does more to compute the most accurate ranking possible.

With scores in place for each listed database, the website allows you to build trend charts comparing the popularity of databases you’re interested in. Figure 1.1 shows a chart comparing the ranks of Oracle, MySQL, SQL Server, Postgres, and MongoDB over the last twelve years. Although Postgres’s total score places it in fourth place in late 2024, the trend shows that Postgres has been gaining momentum, while the popularity of the top three leaders has either stagnated or declined.

Figure 1.1 DB-Engines ranking showing Postgres trending up in popularity.

Apart from DB-Engines, StackOverflow surveys thousands of developers annually to measure the popularity of various programming languages, databases, tools, and other technologies. The developers voted for Postgres as the most popular, admired, and desired database in both the 2023 and 2024 StackOverflow Developer Surveys.

What can explain the growing popularity of Postgres among developers and other technical experts? The following three factors seem to be contributing the most:

Postgres is open source and governed by the community. Postgres started as a research project at Berkeley University led by Michael Stonebreaker, a computer scientist and Turing Award winner specializing in database systems. The Berkeley team released the database to open source in 1994 under the MIT license. A few years later, in 1996, the PostgreSQL Global Development Group was established to oversee and lead the project development. Since then, Postgres has remained fully open and governed by the global community. There is no single vendor who has full control of the project. Postgres is being built and led by the open-source community.
Postgres is enterprise-ready. Postgres has gained a reputation as one of the most reliable and robust databases throughout its thirty-five years of development. Even though the community releases a new major Postgres version annually, the core database engine remains unchanged or receives only minor improvements.
This conservative approach to core engine development has benefits, resulting in fewer regressions, instabilities, and surprises for the users. For instance, Mark Callahan, a recognized independent database expert, regularly publishes performance benchmarks of various databases on his blog (http://smalldatum.blogspot.com). In one of his performance regression tests of Postgres from version 9 to 16, he demonstrated that the newer versions usually get faster or exhibit a small number of performance regressions.
In addition to the quality of the database, there is a rich network of vendors, cloud providers, and consultants providing services and various paid offerings, which is important for companies seeking enterprise-grade support and solutions.
Postgres is extendible by design. While the community takes a relatively conservative approach to the Postgres core development, most innovation happens via the ecosystem of Postgres extensions and other derived solutions.
Michael Stonebreaker defined extendibility as one of the main design goals in his original “The Design of Postgres” paper, which was published during the early days of Postgres at the University of Berkeley:

“Provide user extendibility for data types, operators and access methods”

Since then, the rich ecosystem of extensions has served Postgres well. Hundreds of extensions simplify its operations, bring in new capabilities, and let Postgres easily deal with generative AI, time series, geospatial, and other workloads.

Considering the growing popularity of Postgres, which relies on its robustness, extendibility, and open-source nature, does this mean that Postgres can tackle all types of workloads and use cases? Let’s understand why the book is titled “Just Use Postgres” and what that saying implies.

1.3 “Just Use Postgres” Explained

“Just Use Postgres” has become the motto of the Postgres community. If you’re searching for a database to handle JSON documents, perform full-text or vector similarity searches, or manage other workloads beyond typical relational database capabilities, you’ll likely hear that advice from a community member.

Does this mean Postgres is a Swiss Army knife and the only database every developer needs? Certainly not. The saying implies that Postgres comes with depths and breadths of capabilities that let you use the database for use cases far beyond traditional transactional workloads.

Treat “Just Use Postgres” as a reasonable hint from those who have been building or using Postgres for decades. If you or your team already use Postgres (or are planning to do so) and another use case needs to be supported (such as JSON documents, geospatial, time series, generative AI, or others), then before adding another database to your software stack, check whether Postgres can solve the use case for you.

If you discover that Postgres works perfectly well for the new use case, you will avoid learning and managing two or more database systems in production. If you conclude otherwise, it is not a problem; bring in another database system that solves the use case better. The choice is always yours, and the goal of this book is to guide you through the depth and breadth of Postgres capabilities, helping you make better decisions and see the full potential of Postgres!

Now that we have a good understanding of Postgres and the reasons behind its increasing growth, let’s see it in action as developers. We'll start by deploying, connecting to, and working with a Postgres instance on our laptops in under ten minutes.

1.4 Starting Postgres in Docker

According to the stats for the Postgres repo on GitHub, over 85% of the database code is written in the good old C programming language. The Postgres development group chose to support both Unix-based and Windows operating systems, producing database binaries for a broad range of systems and CPU architectures. Whether your laptop is powered by Windows, Linux, macOS, or another operating system, you can go to https://www.postgresql.org/download/ and pick an installation option that works best for you.

In this book, you’ll learn how to deploy Postgres in a cross-platform way using Docker. With Docker, you can pass a single command that will spin up a database container in under a minute. This installation option is ideal if you want an easy-to-manage development environment without making Postgres a permanent part of your developer laptop. Docker lets you start, stop, and remove the database container, bypassing OS-specific installers, package managers, and other services.

The only requirement is to have Docker installed. If the following command fails in your terminal, go to the Docker installation page and get the engine running on your machine first.

$ docker version

Once Docker is installed and ready, refer to listing 1.1 to start the latest version of the database container on a Unix operating system such as Linux or macOS.

Listing 1.1 Starting Postgres container on Unix

mkdir ~/postgres-volume  #A
                    
docker run --name postgres \    #B
    -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \    #C
    -p 5432:5432 \    #D
    -v ~/postgres-volume/:/var/lib/postgresql/data \    #E
    -d postgres:latest   #F

If you’re a Windows user, use the command from listing 1.2 in PowerShell instead.

Listing 1.2 Starting Postgres container on Windows

docker run --name postgres `  #A
    -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password `  #B
    -p 5432:5432 `  #C
    -v ${PWD}/postgres-volume:/var/lib/postgresql/data `  #D
    -d postgres:latest  #E

TIP

Mapping the volume to the host directory ensures you can start, stop, and recreate the Postgres container without losing application data, configuration settings, or logs.

Usually, it can take up to a minute to pull the image from Docker Hub and perform the initial configuration of the database container. You can always check the status by running the following command:

$ docker container ls -f name=postgres

If the STATUS field is similar to this one, it means the container has started successfully:

IMAGE             STATUS              PORTS                     NAMES
postgres:latest   Up About a minute   0.0.0.0:5432->5432/tcp    postgres

Finally, you can do one last check by viewing the database logs using the following Docker command:

$ docker logs postgres

If you see this message in the logs, then you’ve successfully initialized Postgres and it’s ready to serve your requests:

listening on IPv4 address "0.0.0.0", port 5432
listening on IPv6 address "::", port 5432
...
database system is ready to accept connections

1.5 Connecting with psql

With the Postgres container running, you can roll up your sleeves and start experimenting. You can always connect to the database using your preferred SQL tool or create a code snippet in your favorite programming language. But what if you don't have any SQL tools on your laptop and you're not ready to build a sample application using Postgres yet? In this case, the psql tool comes to the rescue.

The psql tool is a powerful and lightweight command-line utility that comes bundled with the database binaries. The source code for psql is located in the main Postgres repository and is maintained by the PostgreSQL Global Development Group. This means that the Postgres Docker container also includes psql, allowing you to connect to the database without needing to install any additional tools on your host operating system.

To connect to Postgres with psql from within the container, use the command from listing 1.3.

Listing 1.3 Connecting with psql

docker exec -it postgres psql -U postgres

The Docker exec command connects to the postgres container and then executes the psql -U postgres instruction. The latter uses psql to establish a connection with the Postgres instance running inside the container. The -U postgres flag instructs psql to use the postgres as the username.

Upon successful connection, psql will welcome you with the prompt similar to the following:

psql (17.2 (Debian 17.2-1.pgdg120+1))
Type "help" for help.   
postgres=#

TIP

Use the \q command to terminate the psql connection to Postgres.

Apart from supporting standard SQL statements, psql also comes with various meta-commands that let you see information about the current connection, list existing database objects, copy data from external files, and much more. All these commands start with the backslash symbol (\).

TIP

Run the \? command to view a complete list of meta-commands supported by psql. To close the generated list and return control to the psql prompt, press the q button.

As an example, the \conninfo command displays information about the current database connection.

$ \conninfo
You are connected to database "postgres" as user "postgres" via socket in "/var/run/postgresql" at port "5432".

Note that the message generated by the command indicates you’re connected to a database named “postgres.” In this context, the database term implies a named collection of database objects such as tables, views, and indexes, not the Postgres database container/server itself. The “postgres” database (named collection) is created by default during initialization, and you connect to it unless another database is specified. You can create additional databases using the CREATE DATABASE statement, which is discussed in greater detail in Chapter 2.

The \d is another frequently used meta-command that lists tables, views, and sequences belonging to the database you’re connected to (which is “postgres” in our case).

$ \d
Did not find any relations.

Since you've just started with a fresh Postgres database container, the command reports that you haven’t created any tables or other relations yet. This is easy to address. You just need to load a dataset, which we do in the next section by generating mock data using built-in Postgres capabilities.

1.6 Generating mock data

Once you have the database running and know how to connect to it, the next logical step is to find a sample dataset, load it into the database, and run a few experiments to help you learn the database's capabilities.

Plenty of datasets are available for Postgres that you can find online. However, in this section, you’ll learn how to generate mock data using built-in Postgres capabilities. Note, the data generation approach that we’re going to explore isn’t considered the best practice in Postgres, but a handy technique you can use in certain situations without leaving the boundaries of your command-line tool

What if you need to load another existing dataset?

Imagine that you have a sample dataset stored in the “my_sample_dataset.sql” file and you’d like to load it into the Postgres container. This can be easily done following these steps:

Copy the file to the container using the following command:

docker cp my_sample_dataset.sql postgres:/home

Then load the dataset using the psql tool with this command:

docker exec -it postgres psql -U postgres -f /home/my_sample_dataset.sql

First, let’s use the CREATE TABLE command from listing 1.4 to create a simplified version of the table storing information about stocks traded on some stock exchange.

Listing 1.4 Creating sample table

CREATE TABLE trade(
    id bigint,
    buyer_id integer,
    symbol text,
    order_quantity integer,
    bid_price numeric(5,2),
    order_time timestamp
);

Now, if you execute the \d meta-command, Postgres shows the table in the list of existing relations:

$ \d
         List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | trade | table | postgres

However, no trades are in the table yet:

SELECT count(*) FROM trade;
  Count
-------
     0

To fill the table with sample data, we’ll use the generate_series function of Postgres. As the name suggests, the function generates a series of values within a predefined range. It can produce a series of numeric values or timestamps. This version of the function creates a set of integers in the range from start to stop (inclusive). The step argument is optional and set to 1 by default.

generate_series (start integer, stop integer [, step integer ]) → setof integer

As an example, if you’d like to generate a series of integers from 1 to 5, then make a call to the function as follows:

SELECT generate_series(1,5);

The function produced five rows and assigned the values to the column under a similar name - generate_series.

 generate_series
-----------------
               1
               2
               3
               4
               5
(5 rows)

The name of that column can be easily changed. Let’s say you want the function to generate unique IDs for the trades. Then, you can ask the function to assign the generated values to the id column:

SELECT generate_series(1,5) as id;
 
The function generates the same five integer values, but this time stores them in the id column:
 
 id
----
  1
  2
  3
  4
  5
(5 rows)

Even though generate_series produces integer values, it’s more than enough for mock data generation of varying complexity. For instance, the trade table has the buyer_id column. Assuming that you want to generate ten random buyers, a call to the function can be as shown in listing 1.5.

Listing 1.5 Generating random buyers

SELECT id, random(1,10) AS buyer_id 
FROM generate_series(1,5) AS id;

The query returns the values for the id and buyer_id columns.

id | buyer_id
----+----------
  1 |        5
  2 |        6
  3 |        1
  4 |        6
  5 |       10
(5 rows)

The id is generated by the generate_series function, while the buyer_id is set to a random value in the range from 1 to 10 (inclusive).

NOTE

The random(min_value, max_value) function, which is used in listing 1.5, was added to Postgres in version 17. If you are using an earlier version of the database, use floor(random()*(max_value - min_value + 1) + min_value) instead to generate a random integer in the [min_value, max_value] range.

What if we want to generate values for the stock symbol column of the text type? In this case, the symbol is a registered name of a stock. Let’s say the stock exchange allows you to trade stocks of Apple, Ford, and DoorDash. Those stocks have the following symbols AAPL, F, and DASH. Using the SQL statement in listing 1.6, you can generate five random trades for the discussed stocks.

Listing 1.6 Generating random stock symbols

SELECT id, 
(array['AAPL','F','DASH'])[random(1,3)] AS symbol 
FROM generate_series(1,5) AS id;
 
The output might be as follows:
 
 id | symbol
----+--------
  1 | AAPL
  2 | DASH
  3 | F
  4 | DASH
  5 | F
(5 rows)

The generate_series function produces five rows (trades), using generated integer values as IDs for the trades. The value of the symbol column is randomly selected from the array of text values– ['AAPL', 'F', 'DASH'].

If we follow the same principle, we can generate mock data for the remaining columns of the trade table. Overall, listing 1.7 shows the final query that both generates and inserts 1000 sample trades.

Listing 1.7 Inserting 1000 sample trades

INSERT INTO trade (id, buyer_id, symbol, order_quantity, bid_price, order_time)  #A
    SELECT
        id,  #B  
        random(1,10) as buyer_id,  #C
        (array['AAPL','F','DASH'])[random(1,3)] as symbol,  #D
        random(1,20) as order_quantity,  #E
        round(random(10.00,20.00), 2) as bid_price,  #F
        now() as order_time  #G
    FROM generate_series(1,1000) AS id;  #H

This is the power of SQL. Using a simple SQL query, you can generate mock data using Postgres' built-in capabilities.

As a final check, take a look at the first five generated rows:

SELECT * FROM trade LIMIT 5;

The query produces the following output:

 id | buyer_id | symbol | order_quantity | bid_price |         order_time
----+----------+--------+----------------+-----------+----------------------------
  1 |        3 | F      |             18 |     12.92 | 2024-08-02 02:13:36.266052
  2 |        9 | AAPL   |             19 |     19.91 | 2024-08-02 02:13:36.266052
  3 |        7 | F      |              3 |     17.61 | 2024-08-02 02:13:36.266052
  4 |       10 | F      |              1 |     19.55 | 2024-08-02 02:13:36.266052
  5 |        9 | AAPL   |             15 |     12.25 | 2024-08-02 02:13:36.266052
(5 rows)

NOTE

Because we used the random and now functions to generate the table data, the output in the book will differ from what you see when executing the SELECT queries in this chapter.

1.7 Running basic queries

SQL (Structured Query Language) is the language that Postgres “speaks” natively. Throughout this book, we’ll continue exploring the database’s essential and extended capabilities by “talking” to Postgres in its native language.

Even if you decide to use an object-relational mapping (ORM) framework on top of Postgres, the ORM will still generate and execute raw SQL against the database. Therefore, it’s always beneficial for you, as a developer, to know how to read and write SQL queries, as there may be times when you need to debug or optimize SQL requests generated by the ORM.

To conclude this chapter, let’s run a few basic SQL queries that many developers frequently use in their application logic.

One of the most common queries is to get the number of records that satisfy a specific condition. A combination of the count function with a filter condition passed into the where clause usually does the job. For example, you can get the total number of trades for Apple stock as follows.

SELECT count(*) FROM trade WHERE symbol = 'AAPL';
 
There are 305 trades that satisfy the search condition:
 
count  
-------
   305
(1 row)

Note the * symbol that is passed to the count function. It instructs the database to return the data for all the columns of all the selected rows. Generally, you should avoid using the * operator unless it’s truly necessary to return all the columns to your application logic. The best practice is to specify the names of the columns that your application really needs. This habit will help you use resources like memory, CPU, and network bandwidth more prudently.

Does it mean that by running the count(*) we’re not following the best practices? Let’s take a look at the execution plan:

EXPLAIN ANALYZE SELECT count(*) FROM trade WHERE symbol = 'AAPL';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=22.26..22.27 rows=1 width=8) (actual time=1.920..1.922 rows=1 loops=1)
   ->  Seq Scan on trade  (cost=0.00..21.50 rows=305 width=0) (actual time=0.190..1.862 rows=305 loops=1)
         Filter: (symbol = 'AAPL'::text)
         Rows Removed by Filter: 695
 Planning Time: 0.298 ms
 Execution Time: 2.033 ms

First, don’t be discouraged if you find the execution plan difficult to understand. One of the goals of this book is to make execution plans a part of your developer toolbox. You’ll learn how to read the plans and take advantage of them in practice.

Second, look for the line that says Seq Scan on trade and note the statistics that say rows=305 width=0. This line implies that Postgres scanned through the entire table and found 305 rows that met the search condition. However, because the row size is reported as zero (width=0), it means the database didn’t actually read all the columns, even though the * operator was passed to the count function. In fact, count(*) is treated specially in Postgres, as it simply counts the number of rows without retrieving the actual data stored in the columns.

What else can we do with our sample dataset? As developers, we're often tasked with creating logic to find the most popular items—those in high demand. For instance, the simple SQL query in listing 1.8 lets you find the most traded stocks by volume.

Listing 1.8 Most traded stocks by volume

SELECT symbol, count(order_quantity) AS total_volume   #A
FROM trade
GROUP BY symbol   #B
ORDER BY total_volume DESC;   #C

The output might be as follows:

 symbol | total_volume
--------+--------------
 DASH   |          351
 F      |          344
 AAPL   |          305
(3 rows)

Another common task is to find the most valuable clients (or frequent buyers) who spend the most on our products and services. In the context of our stock exchange, let’s find the top three buyers who have spent the most. The query in listing 1.9 is as simple and elegant as the previous one.

Listing 1.9 Top three buyers

SELECT buyer_id, sum(bid_price * order_quantity) AS total_value   #A
FROM trade
GROUP BY buyer_id   #B
ORDER BY total_value DESC   #C
LIMIT 3;    #D

Postgres returns the result that might be as follows:

 buyer_id | total_value 
----------+-------------
        8 |    18260.73
        9 |    17295.22
        2 |    17133.51
(3 rows)

You can continue by calculating the average price per stock, finding the most recent trades, identifying the highest price trade for each stock, and much more. You can do a lot by crafting simple SQL queries and running them against the dataset generated using built-in Postgres capabilities just a few minutes ago. By now, you’ve had a taste of how to get started with Postgres, and we’re ready to begin exploring its capabilities in more depth. Let’s go.

1.8 Summary

Postgres is one of the most popular and fastest-growing databases.
The open-source nature, enterprise readiness, and extendibility are key factors contributing to its popularity and growth.
The phrase “Just Use Postgres” implies that Postgres offers a wide range of capabilities, allowing it to handle use cases far beyond traditional transactional workloads.
Postgres is written in C and can be installed on Windows and a wide range of Unix-based operating systems.
The database can be started as a container in under a minute on any operating system that supports Docker.
Postgres comes with the generate_series function, which can be used to generate mock data of varying complexity.
Postgres “speaks” SQL natively, allowing you to solve various business tasks by crafting simple and elegant SQL queries.

1 Meeting Postgres

This chapter covers

1.1 What is Postgres?

1.2 Why is Postgres so popular?

Figure 1.1 DB-Engines ranking showing Postgres trending up in popularity.

1.3 “Just Use Postgres” Explained

1.4 Starting Postgres in Docker

Listing 1.1 Starting Postgres container on Unix

Listing 1.2 Starting Postgres container on Windows

TIP

1.5 Connecting with psql

Listing 1.3 Connecting with psql

TIP

TIP

1.6 Generating mock data

What if you need to load another existing dataset?

Listing 1.4 Creating sample table

Listing 1.5 Generating random buyers

NOTE

Listing 1.6 Generating random stock symbols

Listing 1.7 Inserting 1000 sample trades

NOTE

1.7 Running basic queries

Listing 1.8 Most traded stocks by volume

Listing 1.9 Top three buyers

1.8 Summary

Unable to load book!