1 Introduction to Apache Pulsar
This chapter covers
- Introduction to messaging and streaming data concepts
- How Apache Pulsar can handle all of streaming data needs
- Compare Apache Pulsar to other messaging systems such as Apache Kafka
- Real-world use cases where Pulsar is currently being used for stream processing
Developed at Yahoo in 2013, Pulsar was first open sourced in 2016, and only 2 years after joining the Apache Software Foundations’ incubation program graduated to Top Level Project status. Apache Pulsar was designed from the ground up to address the gaps in current open source messaging systems such as multi-tenancy, geo-replication, and strong durability guarantees.
The Apache Pulsar site describes it as a distributed pub-sub messaging system that provides very low publish and end-to-end latency, guaranteed message delivery, zero data loss, and a serverless lightweight computing framework for stream native data processing. Apache Pulsar provides the three key capabilities for processing large data sets in real-time: