1 Introduction to Kafka


This chapter covers

  • Why you might want to use Kafka
  • Common myths of big data and message systems
  • Real-world use cases to help power messaging, streaming, and IoT data processing

As many developers are facing a world full of data produced from every angle, they are often presented with the fact that legacy systems might not be the best option moving forward. One of the foundational pieces of new data infrastructures that has taken over the IT landscape is Apache Kafka®.1 Kafka is changing the standards for data platforms. It is leading the way to move from extract, transform, load (ETL) and batch workflows (in which work was often held and processed in bulk at one predefined time) to near-real-time data feeds [1]. Batch processing, which was once the standard workhorse of enterprise data processing, might not be something to turn back to after seeing the powerful feature set that Kafka provides. In fact, you might not be able to handle the growing snowball of data rolling toward enterprises of all sizes unless something new is approached.

With so much data, systems can get easily overloaded. Legacy systems might be faced with nightly processing windows that run into the next day. To keep up with this ever constant stream of data or evolving data, processing this information as it happens is a way to stay up to date and current on the system’s state.

1.1 What is Kafka?

1.2 Kafka usage

1.2.1 Kafka for the developer

1.2.2 Explaining Kafka to your manager

1.3 Kafka myths

1.3.1 Kafka only works with Hadoop®

1.3.2 Kafka is the same as other message brokers

1.4 Kafka in the real world

1.4.1 Early examples

1.4.2 Later examples

1.4.3 When Kafka might not be the right fit

1.5 Online resources to get started