2 Pulsar concepts and architecture

This chapter covers

Pulsar’s physical architecture
Pulsar’s logical architecture
Message consumption and the subscription types provided by Pulsar
Pulsar’s message retention, expiration, and backlog policies

Now that you have been introduced to the Pulsar messaging platform and how it compares to other messaging systems, we will drill down into the low-level architectural details and cover some of the unique terminology used by the platform. If you are unfamiliar with messaging systems and distributed systems, then it might be difficult to wrap your head around some of Pulsar’s concepts and terminology. I will start with an overview of Pulsar’s physical architecture before diving into how Pulsar logically structures messages.

2.1 Pulsar’s physical architecture

Other messaging systems consider the cluster the highest level from an administrative and deployment perspective, which necessitates managing and configuring each cluster as an independent system. Fortunately, Pulsar provides an even higher level of abstraction known as a Pulsar instance, which is comprised of one or more Pulsar clusters that act together as a single unit and can be administered from a single location, as shown in figure 2.1.

Figure 2.1 A Pulsar instance can consist of multiple geographically dispersed clusters.

2.1.1 Pulsar’s layered architecture

2.1.2 Stateless serving layer

2.1.3 Stream storage layer

2.1.4 Metadata storage

2.2 Pulsar’s logical architecture

2.2.1 Tenants, namespaces, and topics

2.2.2 Addressing topics in Pulsar

2.2.3 Producers, consumers, and subscriptions

2.2.4 Subscription types

2.3 Message retention and expiration

2.3.1 Data retention

2.3.2 Backlog quotas

2.3.3 Message expiration

2.3.4 Message backlog vs. message expiration

2.4 Tiered storage

Summary