Most folks experience their first spatially enabled application when they see pushpins tacked onto points of interest on an interactive map. This provides a glimpse into the vast and varied field of geographic information systems (GIS).
We’ll begin this chapter with a pushpin model. As we demonstrate its limited usefulness, we’ll introduce the need for a spatial database—not just any database, but PostGIS. PostGIS is a spatial database extender for the PostgreSQL database management system. We’ll provide a brief introduction to the entire PostGIS suite and whet your appetite with an example that goes far beyond what you can accomplish with pushpins.
The data and code used in this chapter can be found at www.postgis.us/chapter_01_edition_3.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Popular mapping sites such as OpenStreetMap, Mapbox, Google Maps, Bing Maps, and MapQuest have empowered people in many walks of life to answer the question “Where is something?” by displaying teardrop shapes on a gorgeously detailed, interactive map. No longer are we restricted to textual descriptions of “where,” like “Turn right at the supermarket, and it’ll be the third house on the right with a mangy dog out front.” Nor are we faced with the frustrating problem of not being able to figure out our current location on a paper map.
Going beyond getting directions, organizations large and small have discovered that mapping can be a great resource for analyzing patterns in data. By plotting the addresses of pizza lovers, a national pizza chain can assess where to locate the next grand opening. Political organizations planning grassroots campaigns can easily see on a map where the undecided or unregistered voters are located and target their route walks accordingly. Even though the pushpin model offers unprecedented geographical insight, the reasoning that germinates from it is entirely visual.
In the pizza example, the chain might be able to see the concentration of pizza lovers in a city by means of adding pushpins, but what if they needed to differentiate pizza lovers by income level? If the chain has a gourmet offering, it would be a good idea to locate new restaurants in the midst of mid- to high-income pizza lovers. The pizza planners could use pushpins of different colors on an interactive map to indicate various income tiers, but the heuristic visual reasoning will now be much more complicated, as shown in figure 1.1. Not only do the planners need to look at the concentration of pushpins, they must also keep the varying colors or icons of the pin in mind. Add another variable to the map, like households with lactose-intolerant adults, and the problem overwhelms our feeble minds. Spatial databases come to the rescue.
A spatial database is a database with column data types specifically designed to store objects in space—these data types can be added to database tables. The information stored is usually geographic in nature, such as a point location or the boundary of a lake. A spatial database also provides functions and indexes for querying and manipulating the spatial data, which can be called from a query language such as Structured Query Language (SQL). A spatial database is often just used as a storage container for spatial data, but it can do much more than that. Although a spatial database need not be relational in nature, most are. A spatial database gives you a storage tool, an analysis tool, and an organizing tool all in one.
Presenting data visually isn’t a spatial database’s only goal. The pizza shop planners can store an infinite number of attributes of the pizza-loving household, including income level, number of children in the household, pizza-ordering history, and even religious preferences and cultural upbringing (as they relate to topping choices on a pizza). More important, the analysis need not be limited to the number of variables that can be juggled in the brain. The planners can make very specific requests, like “Give me a list of neighborhoods ranked by the number of high-income pizza lovers who have more than two children.” Furthermore, they can easily incorporate additional data from varied sources, such as the location and rating of existing pizzerias from restaurant review sites or the health-consciousness level of various neighborhoods as identified by the local health commission. Their questions of the database could be as complicated as “Show me the region with the highest number of households where the average closest distance to any pizza parlor with a star-ranking below 5 is greater than 16 kilometers (10 miles). Oh, and toss out the health-conscious neighborhoods.”
Table 1.1 Result of a spatial query (view table figure)
Suppose you aren’t a mapping user but are more of a data user. You work with data day in and day out, never needing to plot anything on a map. You’re familiar with questions like “Give me all the employees who live in Chicago” or “Count up the number of customers in each postal code.” Suppose you have the latitude and longitude of all the employees’ addresses; you could ask questions like “Give me the average distance that each employee must travel to work.” This is the extent of the kind of spatial queries that you can formulate with conventional databases, where data types consist mainly of text, numbers, and dates.
But suppose the question posed is “Give me the number of houses within two miles of the coastline requiring evacuation in the event of a hurricane” or “How many households would be affected by the noise of a newly proposed runway?” Without spatial support, these questions would require you to collect or derive additional values for each data point. For the coastline question, you’d need to determine the distance from the beach, house by house. This could involve algorithms to find the shortest distance to fixed intervals along the coastline or require a series of SQL queries to order all the houses by proximity to the beach and then make a cut. With spatial support, all you need to do is reformulate the question slightly as “Find all houses within a two-mile radius of the coastline.” A spatially enabled database can intrinsically work with data types like coastlines (modeled as linestrings), buffer zones (modeled as polygons), and beach houses (modeled as points).
As with most things in life worth pursuing, nothing comes without some effort. You’ll need to climb a gentle learning curve to tap into the power of spatial analysis. The good news is that unlike other good things in life, the database that we’ll introduce you to is completely free—moneywise.
If you’re able to figure out how to get data into your Google map, you’ll have no problem taking the next step. If you can write queries in non-spatially enabled databases, we’ll open your eyes and mind to something beyond the mundane world of numbers, dates, and strings. Let’s get started.
discuss

PostGIS is a free and open source library that spatially enables the free and open source PostgreSQL object-relational database management system (ORDBMS). We want you to choose PostgreSQL as your relational database and PostGIS as your spatial database extender for PostgreSQL.
PostGIS started as a project of Refractions Research (http://refractions.net), a geospatial consulting company located in Victoria, Canada, and has since been adopted and improved on by governments, universities, public organizations, and other companies.
- Proj—Provides projection support, now in its seventh generation
- Geometry Engine Open Source (GEOS)—Advanced geometry processing support
- Geospatial Data Abstraction Library (GDAL)—Provides many advanced raster-processing features
- Computational Geometry Algorithms Library (CGAL/SFCGAL)—Enables advanced 3D analysis
Most of these projects, including PostGIS, now fall under the umbrella of the Open Source Geospatial Foundation (OSGeo).
The foundation of PostGIS is the PostgreSQL object-relational database management system (ORDBMS), which provides transactional support, gist index support for spatial objects, and a query planner out of the box. It’s a great testament to the power and flexibility of PostgreSQL that Refractions Research chose to build on top of PostgreSQL rather than on any other open source database.
PostGIS and PostgreSQL conform to industry standards more closely than most products. PostgreSQL supports many of the newer ANSI SQL features. PostGIS supports OGC standards and the SQL Multimedia spec (SQL/MM) spatial standard. This means that you aren’t simply learning how to use a set of products; you’re garnering knowledge about industry standards that will help you understand other commercial and open source geospatial databases and mapping tools.
If your data and your APIs implement standards supported by many kinds of software—Cadcorp, Safe FME, AutoCAD, Manifold, MapInfo, Esri ArcGIS, ogr2ogr/GDAL, OpenJUMP, QGIS, Deegree, MapGuide, UMN MapServer, GeoServer, or even standard programming tools like SQL, JavaScript, PHP, Python, Ruby, Java, Perl, ASP.NET, SQL, or new emerging tools—then everyone can use the tools that they feel most comfortable with, or that fit their work processes, or that they can afford, and share information with one another. OSGeo tries to ensure that regardless of how small your pocketbook is, you can still afford to view and analyze GIS data. OGC and ANSI/ISO SQL try to enforce standards across all products so that regardless of how expensive your GIS platform is, you can still make your hard work available to everyone. This is especially important for government agencies whose salaries and tools are paid for with tax dollars; for students who have a lot of will and the intelligence to learn advanced technology, but have small pockets; and even for smaller vendors who have a compelling offering for specific kinds of users but who are often snubbed by larger vendors because they can’t support (or lack access to) the private API standards of the big-name vendors.
PostGIS is supported by a vast number of GIS proprietary desktop and server tools. PostGIS is also the preferred spatial relational database of most open source geospatial desktop and web mapping server tools and the preferred spatial relational database platform for most government and start-ups.
PostGIS provides many spatial operators, spatial functions, spatial data types, and spatial indexing enhancements to PostgreSQL. If you add to the mix the complementary features that PostgreSQL and other related projects provide, you have a jam-packed powerhouse at your disposal that’s well suited for sophisticated GIS analysis and that is a valuable tool for learning GIS.
- Functions to work with GeoJSON, Keyhole Markup Language (KML), Mapbox Vector Tiles (MVT) allowing web applications to talk directly to PostGIS without the need for additional serializing schemes or translations
- Comprehensive geometry processing functions that go far beyond basic geometric operations, including functions for fixing invalid geometries and for simplifying and deconstructing geometries
- Built-in 3D and topology support
- Over 300 seamless operations for working with vectors and rasters in tandem, as well as for converting between the two families
- GeoJSON is an extension of JSON that’s used for representing JavaScript objects. It adds to the JSON standard support for geographic objects.
- KML is an XML format developed by Keyhole (which was purchased by Google), first used in Google’s mapping products and later supported by various mapping APIs.
- Mapbox Vector Tiles (MVT) is a binary vector format popularized by Mapbox that dishes out data in tiles of binary vector data, allowing client-side styling of vector data, often lighter than standard raster tiles, and for scaling resolution.
The major reason PostGIS was built on the PostgreSQL platform was the ease of extensibility that PostgreSQL provided for building new types and operators and for controlling the index operators. PostgreSQL was designed to be extensible from the ground up.
PostgreSQL has a regal lineage that dates back almost to the dawn of relational databases. It’s a cousin of the Sybase and Microsoft SQL Server databases, because the people who started Sybase came from UC Berkeley and worked on the Ingres or PostgreSQL projects with Michael Stonebraker. Michael Stonebraker is considered by many to be the father of Ingres and PostgreSQL and to be one of the founding fathers of object-relational database management systems. The source code of Sybase SQL Server was later licensed to Microsoft to produce Microsoft SQL Server.
PostgreSQL’s claim to fame is that it’s the most advanced open source database in existence. It has the speed and functionality to compete with the popular commercial enterprise offerings, and it’s used to power databases petabytes in size. As time has moved on, new usability features have been added, making it not only the most advanced, but perhaps the most flexible and best relational database out there. For more details about the features of PostgreSQL and the key enhancements in newer versions that are lacking in most other databases (including expensive proprietary ones), please refer to appendix D.
PostgreSQL is becoming a one-size-fits-all database that doesn’t sacrifice the needs and wants of any database users. Most OS distributions carry a fairly new version that provides a quick and painless install process. Since the last edition of this book, cloud offerings have come on board that provide PostgreSQL with PostGIS out of the box. Some popular cloud versions of PostgreSQL that PostGIS users use are CartoDB, Heroku PostgreSQL, Microsoft Azure database for PostgreSQL, and Amazon RDS and Aurora for PostgreSQL. Google BigQuery, a data warehouse service provided by Google, though not PostgreSQL, has adopted PostgreSQL constructs and PostGIS function names and spatial types for querying their spatial data (https://cloud.google.com/bigquery/docs/gis-data).
Licenses for SQL Server Standard start at $5,000 and can easily cost you $20,000 for a modest server. The free version of SQL Server, while it has the same spatial functionality as the paid version, is crippled by its memory and processor limits.
Oracle Standard prior to Oracle 19c shipped only with Oracle Locator, which had only elementary functionality. Oracle spatial prior to Oracle 19c required Oracle Spatial purchase to get the advanced spatial features. Starting with Oracle 19c, all editions include the Oracle spatial support.
PostGIS and PostgreSQL are open source. PostGIS is under a GPLv2+ license; PostgreSQL is under a BSD-style license, which means you can both see and modify the source code. If you find a feature missing, you can contribute a patch or pay a developer to add the feature. Adding features to PostGIS and PostgreSQL generally costs much less than the licensing costs for proprietary counterparts. If you discover a bug in PostGIS or PostgreSQL, you’ll find the PostGIS and PostgreSQL teams very responsive in addressing bugs—more so than most proprietary database vendors.
You have more freedom to control your destiny with PostGIS and PostgreSQL than you do with comparable proprietary offerings. You can install PostGIS on as many servers as you want, and you aren’t limited by artificial restrictions on how many cores you can use.
The openness of PostGIS has spawned an explosion of user-contributed add-ons and community-funded features. These are the most notable ones to date: raster support, geodetic support, topology support, improved 3D support, faster spatial indexes, TIGER geocoder enhancements, and a PostGIS spatial viewer in the pgAdmin4 database management tool commonly shipped with PostgreSQL.
The release cycles for PostGIS and PostgreSQL are radically shorter than those of commercial offerings. With contributions from users, PostgreSQL evolves at a rate of one major version per year and one patch release version every two or three months, with bugs getting immediate attention. You don’t have to wait years in anticipation of features promised in subsequent releases. If you choose to live on the bleeding edge, you can even download a new build every other week.
Admittedly, PostGIS isn’t the only spatial database in use today. Early entrants were dominated by proprietary offerings, and PostGIS broke this mold. Successors to PostGIS are gravitating towards installations with lightweight footprints for use on mobile devices. We’re also beginning to see spatial features in NoSQL databases like MongoDB, CouchDB, Elastic Search, and Solr.
Oracle was the one that started it all. In Oracle 7, joint development efforts with Canadian scientists gave birth to SDO (Spatial Data Option). In later releases, Oracle redubbed this lovechild as Oracle Spatial.
Oracle Spatial isn’t available with lower-priced editions of Oracle. Only when you fork out the money for Oracle Enterprise Edition will you have the luxury of being able to buy the Oracle Spatial option.
Standard Oracle installations do come with something called Oracle Locator, which offers the basic geometry types, proximity functions, some spatial aggregates, and limited spatial processing. Oracle has been pressured by users to provide more spatial support in Oracle Locator, so newer versions of Oracle Locator do provide basic functions like union and intersection but leave out union aggregate options and many other functions you’ll find in PostGIS, SQL Server, and Oracle Spatial.
Microsoft introduced spatial support in their SQL Server 2008 offering, with its built-in Geometry and Geodetic Geography types and companion spatial functions. To Microsoft’s credit, you’ll get the same feature set with their Express, Standard, Enterprise, and Datacenter offerings. You may just be limited regarding database size, how many processors you can use, and what query plan features you’re allowed.
Microsoft’s spatial feature, except their curved and geodetic support, pales in comparison to PostGIS. Admittedly, Microsoft SQL Server has probably got the best curve and geodetic support of any database—it’s the only one to support curved geometries in geodetic space. But don’t expect to find numerous output/input functions, such as input/output for KML, GeoJSON, and MVT, or raster support, or the numerous processing functions that PostGIS has.
Our favorite kids on the block are SpatiaLite and GeoPackage, which are both add-ons to the open source SQLite portable database. These are especially interesting because they can be used as low-end companions to PostGIS and other high-end spatially enabled databases.
GeoPackage is an OGC standard storage and transport mechanism that can store both vector and raster data. Internally, it is a relational database just like PostGIS, and it’s growing in popularity with tools such as QGIS, making it a default standard for exporting data.
GeoPackage is touted more as a data storage than a querying tool, and it leaves the query functionality to tools that use it. SpatiaLite, on the other hand, includes much of the same functionality you’ll find in PostGIS and builds using the same libraries that PostGIS uses: GEOS, PROJ, and GDAL. This makes it an even more fitting companion to PostGIS because many of the conventions are the same and much of the ecosystem around PostGIS also supports or is starting to support SpatiaLite/RasterLite.
What SpatiaLite lacks is a strong enterprise database behind it for writing advanced functions and spatial aggregate functions. That’s why some spatial queries possible in PostGIS are harder to write or are not even possible in SpatiaLite.
SpatiaLite, SQLite, and GeoPackage store data as a single file that’s easily transportable. This makes it less threatening to deploy for users new to databases or GIS and easier to deploy as a lightweight offline database companion to a server-side database like PostGIS/PostgreSQL.
MySQL has had elementary spatial support since version 4, but MySQL, as a database, is handicapped by its lack of a powerful SQL engine. Its primary audience is still developers who are looking for a database that will store something, rather than do something. Earlier MySQL spatial support made the fatal mistake of not providing indexing capabilities except on MyISAM tables—spatial queries rely heavily on indexing for speedy performance. In version 5.6, MySQL extended geometric operations to work beyond bounding boxes and also allowed spatial indexes on its InnoDB storage engine. Newer versions of MySQL and MariaDb offer even more functions, such as GeoJSON and other output functions.
Oracle MySQL and other MySQL forks like MariaDB have made strides in the 5.6 variants by improving the performance of subqueries, but the query planner and SQL feature set in the MySQL family is still a kid when compared to the likes of PostgreSQL, SQL Server, and Oracle, so MySQL is not suitable for doing anything as complex as most spatial analysis. The spatial support has vastly improved in MySQL 8 and MariaDb 10, but it’s still no competition for PostGIS.
Although Oracle MySQL and MariaDb have mostly the same functionality, their spatial offerings are not exactly the same. For a comparison of the differences, see the MariaDB website (https://mariadb.com/kb/en/library/mysqlmariadb-spatial-support-matrix/).
We must give a nod to Esri, which has long packaged its spatial database engine (SDE) with its ArcGIS for Server product. The SDE engine is integrated into the ArcGIS line of products and is often used to spatially enable or augment legacy or weak database products, such as Microsoft SQL Server 2005 and Oracle Locator.
Older versions of ArcGIS desktop required going through an SDE middle tier to get at the native offerings of your spatial database. Newer versions, starting around ArcGIS 10.0, allow for direct access to PostGIS and other databases. By sidestepping the middleware, you’re free to use any version of PostGIS with ArcGIS desktop.
Be careful when using ArcGIS as it installs its own flavor of geometry in PostgreSQL. This often causes users of PostGIS confusion as they sometimes pick the sde.st _geometry database type instead of the PostGIS geometry type and are further locked into Esri middleware. The sde.st_geometry type is needed to use Esri versioning tools, but for most other uses it’s a hinderance.
Although the Esri proprietary model doesn’t sit well with us, we must give them credit—a lot of credit, in fact—for being one of the first major companies to introduce GIS analysis to commercial and government organizations. They paved the way for, but still stand in the way of, the rise of free and open source GIS.
settings

We encourage you to install the latest versions of PostgreSQL and PostGIS—PostgreSQL 13 and PostGIS 3.1 at the time of writing. The introduction of the extension model in PostgreSQL 9.1 greatly simplified the installation of add-ons (such as PostGIS) to two steps:
- Locate and install the binaries for your particular OS into your PostgreSQL directories.
- Individually enable the extensions for each database as needed. For instance, if you have 10 databases on your server, but only 2 require PostGIS, you’d only enable PostGIS for the 2.
Many of the popular Linux/Unix distributions include PostGIS 3.1 in their repositories. Use yum or apt to install the binaries. For Mac users, there are a couple of popular distributions, all itemized on the PostGIS install page (http://postgis.net/install). For MS Windows, we recommend using the EnterpriseDB (EDB) application Stack Builder, if you are uncomfortable with command lines. We also are the package maintainers for the “Spatial Extensions” category in the EDB Windows application Stack Builder. We try to pack the “Spatial Extensions” category with all the PostGIS extensions and many related PostGIS extensions, such as pgRouting and pgPointcloud. Please refer to appendix B for more details on where to obtain binaries for your OS.
Two popular tools come packaged with PostgreSQL: psql and pgAdmin. You use these tools to create databases, users, and compose queries.
If you have the luxury of a graphical interface, we encourage you to use the more newbie-friendly pgAdmin. PgAdmin can be installed separately from PostgreSQL. You can find source code as well as precompiled binaries at the pgAdmin site (www.pgadmin.org).
Once you’ve successfully installed the binaries, you can create a database with a command such as this, using the psql or pgAdmin query tool:
CREATE DATABASE postgis_in_action;
After creating the database, you should connect to it. You can do this in psql with \connect postgis_in_action and in pgAdmin by refreshing the database tree and selecting the new database.
You should next enable PostGIS in your database by connecting to the database and running the code in the following listing. Enabling the extension rarely fails, but you may encounter dependency errors, especially if you have earlier versions of PostGIS floating around.
Listing 1.1 Enabling PostGIS in a database
CREATE SCHEMA postgis; #1 GRANT USAGE ON schema postgis to public; #2 CREATE EXTENSION postgis SCHEMA postgis; #3 ALTER DATABASE postgis_in_action SET search_path=public,postgis,contrib; #4
Tip
You can also enable extensions in pgAdmin, using the Extensions install section pictured in figure 1.2.
Warning
If postgis isn’t listed, you can install it by right-clicking the Extensions branch, choosing New Extension, and picking postgis from the menu.
You should see postgis listed in the Add Extension menu if you installed the binaries and don’t have it already installed in your selected database.
After a PostGIS install, disconnect from your database and reconnect. Then quickly verify the versions to make sure the installation succeeded. Execute the following query:
SELECT postgis_full_version();
If all is well, you should see the version of PostGIS, as well as the versions of the supporting GEOS, GDAL, PROJ, LIBXML, and LIBJSON libraries, as shown here:
POSTGIS="3.1.1 3.1.1" [EXTENSION] PGSQL="130" GEOS="3.9.1-CAPI-1.14.1" PROJ="7.1.1" GDAL="GDAL 3.2.1, released 2020/12/29" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTER
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Four key spatial types are offered by PostGIS: geometry, geography, raster, and topology. PostGIS has always supported the geometry type from its inception. It introduced support for geography in PostGIS 1.5. PostGIS 2.0 raised the bar further by incorporating raster, introducing areal types in the geometry type and network topology support. Although PostGIS 2.1 introduced many more functions, perhaps the most important feature it provided was faster speed, particularly for raster and geography operations. Newer versions of PostGIS have introduced newer spatial index types such as spgist, BRIN, and support for parallelizing queries:
- Geometry—The planar type. This is the very first model, and it’s still the most popular type that PostGIS supports. It’s the foundation of the other types. It uses the Cartesian math you learned about in high school geometry.
- Geography—The spheroidal geodetic type. Lines and polygons are drawn on the earth’s curved surface, so they’re curved rather than straight lines. PostGIS 2.2 introduced support for any geodetic spatial reference systems, which means you can use geography for other planets, such as Mars or your own made-up world.
- Raster—The multi-band cell type. Rasters model space as a grid of rectangular cells, each containing a numeric array of values.
- Topology—The relational model type. Topology models the world as a network of connected nodes, edges, and faces. Objects are composed of these elements and may share these with other objects. There are really two related concepts in topology—the network, which defines what elements each thing is composed of, and routing. PostGIS 2+ packages the network topology model, which is often just referred to as topology.
Network topology ensures that when you change the edge of an object, other objects sharing that edge will change accordingly. Routing is commonly used with PostGIS via a long-supported add-on called pgRouting. Routing not only cares about connectedness but also how costly that connectedness is. pgRouting is mostly used for building trip navigation applications (taking into account the cost of tolls or delays due to construction), but it can be used for any application where costs along a path are important. We’ll cover pgRouting in later chapters of this book.
All these four types can coexist in the same database and even as separate columns in the same table. For example, you can have a geometry that defines the boundaries of a plant, and you can have a raster that defines the concentration of toxic waste along each part of the boundary.
In two dimensions, you can represent all geographical entities with three building blocks: points, linestrings, and polygons (see figure 1.3). For example, an interstate highway crossing the salt flats of Utah clearly jumps out as linestrings cutting through a polygon. A desolate gas station located somewhere along the interstate can be a point.
But you need not limit yourself to the macro dimensions of road atlases. Look around your home. Use rectangular polygons to represent rooms. The wiring and the piping running behind the walls would be linestrings. You can use either a point or a polygon to stand in for the dog house, depending on its size. Just by abstracting the landscape to 2D points, linestrings, and polygons, you have enough to model everything that could crop up on a map or a blueprint.
Don’t be overly concerned with the rigorous definition of the geometries. Questions such as “how many angels will fit into a point,” and “what is the width of a linestring” are best left for mathematicians and philosophers. To us, points, linestrings, and polygons are simplified models of reality. As such, they’ll never perfectly mimic the real thing. Also, don’t worry if you feel that we’re leaving out other geometries. Two good examples are beltways around a metropolis and hippodromes. The former could be well represented by circles, the latter by ellipses. You’ll do fine by approximating them using linestrings with many segments and polygons with many edges.
The geometry type treats the world as a flat Cartesian grid. The mathematics behind the model requires nothing more than the analytic geometry you learned in high school. The geometry model is intuitively appealing and computationally speedy, but it suffers from one major shortcoming—the flat earth.
The curvature of the earth comes into play when you’re modeling anything that extends beyond the visual horizon. Although geometry works for architectural floor plans, city blocks, and runway diagrams, it comes up short when you model shipping lanes, airways, or continents, or whenever you consider two locations that are far apart. You can still perform distance computations without abandoning the Cartesian underpinnings by sprinkling a few sines and cosines into your formulas, but the minute you need to compute areas, the math becomes intractable.
A better solution is to use a family of data types based on geodetic coordinates—geography. Geography shields the complexity of the math from the PostGIS user. As a trade-off, geography offers fewer functions, and it trails geometry in speed. You’ll find the same point, linestring, and polygon data types in geography; just keep in mind that the linestrings and polygons conform to the curves of a globe.
Geometry and geography are vector-based data types. Loosely speaking, anything you can sketch with an ultra-fine pen without running short on ink lends itself to vector representation. Vectors are well suited to modeling designed or constructed features, but suppose you snap a colored photo of the coral-rich Coral Sea. With its motley colors and fractal patterns, you’re going to have a hard time constructing lines and polygons out of the photo. Your best hope is to quantize the photo into microscopic rectangles and assign a color value to each. Raster data is exactly this—a mosaic of pixels.
Perhaps the best example of a raster is the television you stare into every day, for hours on end. A TV screen is nothing more than a giant raster with some two million pixels. Each pixel stores three different color values: the intensity of red, green, and blue (hence the term RGB). In raster-speak, each color is called a band. The pixel represents some area of geographic space, which can vary based on the dimensions of the film you are watching and the number of pixels on your TV set.
If you’re buying a TV, the physical number of pixels will matter greatly to you: the larger the number of pixels, the bigger the viewing area and the more money it’ll cost you. A pixel represents a certain unit of area in reality, and raster data is stored in those pixels.
Raster data almost always originates from instrumental data collection and often serves as the raw material for generating vector data. As such, you’ll encounter a lot more sources of raster data than vector data. PostGIS will let you overlay vector data atop raster data and vice versa. The satellite view you often see in maps is a perfect example of such an overlay. You see roads (vector data) superimposed on top of the satellite imagery (raster).
- Land coverage or land use.
- Temperature and elevation variations. This is a single-band raster where each square holds a measured temperature or elevation value.
- Color aerial and satellite photos. These have four bands—one for each of the colors of the RGB, and A for alpha intensity color space.
When you gaze down at the terrain from your private jet, what you witness is not distinct geometries on a barren terrestrial plane, but an interwoven network of points, linestrings, and polygons. A cornfield abuts a wheat field, which abuts a pasture, which abuts a large expanse of prairie. Roads, rivers, fences, or other artificial boundaries divide them all. The surface of the earth (at least the parts that host humanity) resembles a completed jigsaw puzzle. Topology models take on this jigsaw perspective of the world. Topology recognizes the inherent interconnection of geographic features and exploits it to help you better manage data.
Consider a historical example where you want to model the United States and Mexico as two large polygons. Prior to the Gadsden Purchase, the northern boundary of Mexico extended well into present day Arizona and parts of New Mexico. For 33 cents per acre, the US “purchased” 30 million acres from Mexico. The US polygon grew as the Mexico polygon shrank. If you were using the geometry family to model the two polygons, you’d have to perform two operations to get your record-keeping straight: enlarge the US and shrink Mexico. Using the topology model, you only need to perform one operation—either the enlargement or the shrinkage—because topology tracks the fact that the US abuts Mexico. If the US grows on its southern border, Mexico must shrink on its northern border. One operation implies the other.
Topology isn’t concerned with the exact shape and location of geographic features, but with how they’re connected to each other.
- Parcel (land lot) data, where you want to ensure that the change of one parcel boundary adjusts all other parcels that share that boundary change as well.
- Road management, water boundaries, and jurisdiction divisions. U.S. Census MAF/Topologically Integrated Geographic Encoding and Referencing system (TIGER) data is a perfect example (www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html).
- Architecture.
discuss

In this section, we’ll walk you through a full example from start to finish. Unfortunately, PostGIS is not a programming language where a few lines of code will print a “Hello World” message on your screen. Instead, to provide you with a true taste of PostGIS, we’re going to guide you through the following steps:
- Digesting a problem and formulating a solution
- Modeling
- Gathering and loading data
- Writing a query
- Viewing the result
If you’re completely new to PostGIS, just perform the tasks we ask of you for now. You won’t understand most of what you’re typing, but you’ll have the rest of this book for that. Right now, we want to give you an overview of the steps involved in writing a spatial query.
Before going further, you’ll need to have working copies of PostGIS and PostgreSQL, as well as ancillary tools such as pgAdmin to compose and execute your queries. Information about acquiring and installing these can be found in appendix B. As always, if you’re starting from scratch, we recommend you install the latest versions.
Here’s the scenario you’re faced with: you need to find the number of fast-food restaurants within one mile of a highway. As for why someone might want to do this, any of the following reasons could apply:
- A fast-food chain is trying to locate a new store where supply falls short.
- A highway commissioner wants to satisfy the needs of motorists, who will be paying tolls.
- A health-conscious parent is trying to cut down the availability of fast food in the neighborhood.
- Hungry travelers are looking for their next meal.
First, you need to realize that you’re not going to be able to answer this question quickly or accurately with your usual arsenal of Google Maps, Bing, or MapQuest, or even with the latest paper map you picked up from the auto association. Learning PostGIS may not be any quicker, but you’ll have at your disposal the tools and skills to solve any and all problems of this kind in the future. Replace the highway with a lake, and you can determine how many homes surrounding the lake can be considered waterfront property. On a geodetic scale, replace the highway with the continent of Australia, and you can determine the number of islands within territorial waters. From there, you can even go on to a planetary scale and ask how many moons are within 10 million kilometers at perigee.
Once you have an initial understanding of the problem, we recommend that you immediately perform a feasibility study, even if it’s just in your mind. You don’t want to devote time toward a solution if the problem itself is impossible to solve, lacking specificity, or, worse, you have no available data source.
You need to translate the real world to a model that is composed of database objects. For this example, you’ll represent the highway as a geometric linestring and the locations of fast-food restaurants as points. You’ll then create two tables: highways and restaurants.
First you need to create a schema to hold your data for this chapter. A schema is a container, similar to a directory, that you’ll find in most high-end databases. It logically segments objects (tables, views, functions, and so on) for easier management:
CREATE SCHEMA ch01;
In PostgreSQL it’s very easy to back up selected schemas and also to set up permissions based on schemas. You could, for example, have a big schema of fairly static data that you exclude from your daily backups, and you could divide schemas along user groups so that you can allow each group to manage their own schema set of data. The postgis_in_action database schemas are chapter-themed so that it’s easy to download just the set of data you need for a specific chapter. Refer to appendix D for more details about schemas and security management.
Next you need to create a lookup table to map franchise codes to meaningful names, as in the following listing. You can then add all the franchises you’ll be dealing with.
Listing 1.2 Create a franchise lookup table
CREATE TABLE ch01.lu_franchises (id char(3) PRIMARY KEY , franchise varchar(30)); #1 INSERT INTO ch01.lu_franchises(id, franchise) #2 VALUES ('BKG', 'Burger King'), ('CJR', 'Carl''s Jr'), ('HDE', 'Hardee'), ('INO', 'In-N-Out'), ('JIB', 'Jack in the Box'), ('KFC', 'Kentucky Fried Chicken'), ('MCD', 'McDonald'), ('PZH', 'Pizza Hut'), ('TCB', 'Taco Bell'), ('WDY', 'Wendys');
Listing 1.3 Create a restaurants table
CREATE TABLE ch01.restaurants ( id serial primary key, #1 franchise char(3) NOT NULL, geom geometry(point,2163) #2 );
For your later analysis, you’ll need to uniquely identify restaurants so that you don’t double-count them. Also, certain mapping servers and viewers, such as MapServer and QGIS, balk at tables without integer primary keys or unique indexes. The restaurant data has no primary key, and nothing in the data file lends itself to a good natural primary key, so you create an autonumber primary key ❶.
Next, you need to place a spatial index on your geometry column. This step can be done before or after the data load.
CREATE INDEX ix_code_restaurants_geom ON ch01.restaurants USING gist(geom);
If you are planning to load a lot of data into the table, it is more efficient to create the spatial index and any other indexes after the data load is complete so the indexing of each record doesn’t impact the load performance.
As part of the definition of an index in PostgreSQL, you must specify the type of index, as we did in the preceding CREATE INDEX. PostGIS spatial indexes are of the gist, spgist, or brin index types. For most use cases, you’ll want to stick with gist. We’ll go over when to use each index type later in this book.
Although it’s not necessary for this particular data set, because it won’t be updated, you’ll create a foreign key relationship between the franchise column in the restaurants table and the lookup table. This helps prevent people from mistyping franchises in the restaurants table. Adding CASCADE UPDATE DELETE rules when you add foreign key relationships will allow you to change the franchise ID for your franchises if you want, and to have those changes update the restaurants table automatically:
ALTER TABLE ch01.restaurants ADD CONSTRAINT fk_restaurants_lu_franchises FOREIGN KEY (franchise) REFERENCES ch01.lu_franchises (id) ON UPDATE CASCADE ON DELETE RESTRICT;
By restricting deletes, you prevent inadvertent removal of franchises with extant records in the restaurants table. (One added benefit of foreign keys is that relational designers, such as those you’ll find in OpenOffice Base and other ERD tools, will automatically draw lines between the two tables to visually alert you to the relationships.)
CREATE INDEX fi_restaurants_franchises ON ch01.restaurants (franchise);
Listing 1.4 Create a highways table
CREATE TABLE ch01.highways #1 ( gid integer NOT NULL, feature character varying(80), name character varying(120), state character varying(2), geom geometry(multilinestring,2163), #2 CONSTRAINT pk_highways PRIMARY KEY (gid) ); CREATE INDEX ix_highways ON ch01.highways USING gist(geom); #3
In this case, you’re creating the spatial index before loading the data, but for large tables that are loaded only once, it’s more efficient to create the indexes after you have loaded the data.
In this chapter, you first created the data tables and are now chasing after data to populate them. Ideally, these are the steps you’d want to take. In reality, though, you’ll sometimes find yourself subservient to the available data and begrudgingly have to alter your ideal table structure to fit what’s available.
But don’t surrender to the availability of real data too easily. You can often create SQL scripts that will translate the less-than-perfect data from your source into your perfected data structure. Always give primacy to your model. A well-thought-out model can often ride out the vagaries of a data source. We’ll follow this mantra as we continue.
Fastfoodmaps.com graciously provided us with a comma-delimited file of all fast-food restaurants circa 2005. To import a CSV file, you need to create a table beforehand. After quickly studying the CSV file, you can create a staging table:
CREATE TABLE ch01.restaurants_staging ( franchise text, lat double precision, lon double precision);
\copy ch01.restaurants_staging FROM '/data/restaurants.csv' DELIMITER as ',';
Note
Your purpose here is to get the CSV data into a table so you can scrutinize it more carefully and write any additional queries to sanitize the data before you insert it into the production table. In this case, the data passes the quality check, so you can proceed with the insert:
INSERT INTO ch01.restaurants (franchise, geom) SELECT franchise , ST_Transform( ST_SetSRID(ST_Point(lon , lat), 4326) , 2163) As geom FROM ch01.restaurants_staging;
Next, you use a point geometry column to store your restaurant locations. The second argument to the geometry function indicates the spatial reference ID (SRID) that you’ve selected for the restaurant data. The SRID denotes the coordinate range and how the spherical space is projected on a flat surface. In this example we use SRID 4326 (which corresponds to WGS 84 lon/lat), but then transform all the data to our desired planar projection for faster analysis. We’ll get into more detail about spatial reference systems in chapter 3.
If you’re coming from a GIS background, you’ll know that you must have common projections before you can compare two data sets. This example uses EPSG:2163, which is an equal-area projection covering the continental United States.
You’ll find Esri shapefiles to be a common storage format for spatial data, mostly due Esri’s early predominance in GIS. To load data from shapefiles into a PostGIS database, use the shp2pgsql command-line utility that comes with all PostGIS installations. If you’re on Windows or Linux/Unix with a graphical desktop, you can also use the DbManager tool within a desktop tool called QGIS, which we’ll cover later in this book. Both shp2pgsql and QGIS can load DBF files in addition to the Esri shapefile format.
We know our projection to be NAD 83 lon/lat, so we indicate this by changing the SRID to 4269, but be careful here! You’re simply telling the importer what the SRID is for the data coming in. You’re not transforming it! In this example, we also changed the name of the imported table to highways_staging. Click the Import button once you’re ready.
Once the import finishes, you should see the new highways_staging table in your database. You may have to refresh the browse tree in pgAdmin. Both shp2pgsql-gui and its command-line sibling automatically add a column named geom during the import and set its data type by reading information contained in the shapefile. If you’re unfamiliar with the raw data, this is the time to study it. Perform general sanity checks, such as checking the total record count, inspecting columns that came in without data, and so on.
To load the highway data into a staging table using the shp2pgsql command-line, you would do the following:
shp2pgsql -D -s 4269 -g geom -I /data/roadtrl020.shp ch01.highways_staging | psql -h localhost -U postgres -p 5432 -d postgis_in_action
After you’re satisfied that the importer did its job without dropping any information, you can write an INSERT query to move the data from your staging table to the production table. In the query, you want to transform the SRID from 4269 to 2163 and only select columns that you defined in your production table. You can also filter the data to only the needed rows. The highway data has approximately 47,000 rows and includes every major and state highway in the U.S., and you’re only going to be looking at major highways, so you can add a filter that will bring the row count down to about 14,000.
Listing 1.5 Populating the highways table
INSERT INTO ch01.highways (gid, feature, name, state, geom) SELECT gid, feature, name, state, ST_Transform(geom, 2163) FROM ch01.highways_staging WHERE feature LIKE 'Principal Highway%';
The shp2pgsql command line lets you transform the SRID with an additional :<to_ srid>, so you could skip the ST_Transform step in your code by replacing the 4269 with -s 4269:2163 as follows:
After you’ve finished loading the data, it’s good to follow up with a vacuum analyze so the statistics are up to date:
vacuum analyze ch01.highways;
It’s now time to write the query. Remember the question we set out to answer: “How many fast-food restaurants are within one mile of a highway?” The query that will answer this question is shown in the following listing.
Listing 1.6 Restaurants within one mile of a highway
SELECT f.franchise , COUNT(DISTINCT r.id) As total #1 FROM ch01.restaurants As r INNER JOIN ch01.lu_franchises As f ON r.franchise = f.id INNER JOIN ch01.highways As h ON ST_DWithin(r.geom, h.geom, 1609) #2 GROUP BY f.franchise ORDER BY total DESC;
The crux of this example is where you join the restaurants table with the highways table using the ST_DWithin function. This commonly used function accepts two geometries and returns TRUE if the minimum distance between the two geometries is within the specified distance. In this case, you pass in a point for the restaurant, a multilinestring for the highway, and 1609 meters as the distance. All restaurant-highway pairs matching the join condition will filter through.
The join condition does allow for duplicate restaurants. For example, a McDonald’s located at the intersection of two major highways would show up twice. To only count each restaurant once, you use the COUNT(DISTINCT) construct.
The rest of the code is elementary SQL. If you’re a little rusty on SQL, please see appendix C for a refresher. As fair warning, the SQL we use in this book will get harder.
franchise_name | total ------------------------+------ McDonald's | 5343 Burger King | 3049 Pizza Hut | 2920 Wendy's | 2446 Taco Bell | 2428 Kentucky Fried Chicken | 2371 :
What’s more gratifying than to see your query output displayed on a map? You don’t want to display some 20,000 dots on a map of the US—you can find that on each chain’s restaurant locator. Instead, you’re going to draw a buffer zone around highway segments and see how many dots fall within them.
For this you’ll use the ST_Buffer function. This function will take any geometry and radially expand it by a specified number of units. The post-expansion polygonal geometry is called a buffer zone or corridor.
Note
For this example, we’ll locate Hardee’s restaurants within a 20-mile buffer of US Route 1 in the state of Maryland. Here’s the query to get the count:
SELECT COUNT(DISTINCT r.id) As total FROM ch01.restaurants As r INNER JOIN ch01.highways As h ON ST_DWithin(r.geom, h.geom, 1609*20) WHERE r.franchise = 'HDE' AND h.name = 'US Route 1' AND h.state = 'MD';
Let’s see where the three Hardee’s restaurants are located. Fire up OpenJump and connect to your PostgreSQL database. You can first draw US Route 1 using the following query:
SELECT gid, name, geom FROM ch01.highways WHERE name = 'US Route 1' AND state = 'MD';
SELECT ST_Union(ST_Buffer(geom, 1609*20)) FROM ch01.highways WHERE name = 'US Route 1' AND state = 'MD';
SELECT r.geom FROM ch01.restaurants r WHERE EXISTS (SELECT gid FROM ch01.highways WHERE ST_DWithin(r.geom, geom, 1609*20) AND name = 'US Route 1' AND state = 'MD' AND r.franchise = 'HDE');
Figure 1.4 US Route 1 in Maryland, with three Hardee’s restaurants in the 20-mile buffer, and the 20-mile buffer around the route

Play around with this example. Use your home state and your favorite chain to see how far you have to go to grab your next nutritious meal.
Some of the SQL examples we demonstrated were at an intermediate level. If you’re new to SQL or spatial databases, these examples may have seemed daunting. In the chapters that follow, we’ll explain the functions we used here and the SQL constructs in greater detail. For now, we hope that you focused on the general steps we followed and the strategies that we chose.
Although spatial modeling is an integral part of any spatial analysis, there’s no right or wrong answer in modeling. Modeling is inherently a balance between simplicity and adequacy. You want to make your model as simple as possible so you can focus on the problem you’re trying to solve, but you must retain enough complexity to simulate the world you’re trying to model. Therein lies the challenge.
- PostGIS spatially enables PostgreSQL, allowing you to model real-world objects in a database and answer questions of where and how far.
- PostGIS and PostgreSQL provide tools to load data from common data sources.
- There are freely available tools such as OpenJump, QGIS, and pgAdmin that allow you to experience your spatial data visually.
- PostGIS adds functions that can be used in SQL to answer questions about where and how far quickly and succinctly.
- Sometimes a table of statistics is more digestible than a figure of dots, colors, and shapes.