1 Introduction to Apache Pulsar


This chapter covers

  • The evolution of the enterprise messaging system
  • A comparison of Apache Pulsar to existing enterprise messaging systems
  • How Pulsar’s segment-centric storage differs from the partition-centric storage model used in Apache Kafka
  • Real-world use cases where Pulsar is used for stream processing, and why you should consider using Apache Pulsar

Developed by Yahoo! in 2013, Pulsar was first open sourced in 2016, and only 15 months after joining the Apache Software Foundation’s incubation program, it graduated to top-level project status. Apache Pulsar was designed from the ground up to address the gaps in current open source messaging systems, such as multi-tenancy, geo-replication, and strong durability guarantees.

The Apache Pulsar site describes it as a distributed pub-sub messaging system that provides very low publish and end-to-end latency, guaranteed message delivery, zero data loss, and a serverless, lightweight computing framework for stream data processing. Apache Pulsar provides three key capabilities for processing large data sets:

1.1 Enterprise messaging systems

1.1.1 Key capabilities

1.2 Message consumption patterns

1.2.1 Publish-subscribe messaging

1.2.2 Message queuing

1.3 The evolution of messaging systems

1.3.1 Generic messaging systems

1.3.2 Message-oriented middleware

1.3.3 Enterprise service bus

1.3.4 Distributed messaging systems

1.4 Comparison to Apache Kafka

1.4.1 Multilayered architecture

1.4.2 Message consumption

1.4.3 Data durability

1.4.4 Message acknowledgment

1.4.5 Message retention

1.5 Why do I need Pulsar?

1.5.1 Guaranteed message delivery

1.5.2 Infinite scalability

1.5.3 Resilient to failure

1.5.4 Support for millions of topics