7 Topics and partitions

 

This chapters covers

  • Creation parameters and configuration options
  • How partitions exist as log files
  • How segments impact data inside partitions
  • Testing with EmbeddedKafkaCluster
  • Topic compaction and how data can be retained

In this chapter, we will look further into how we might store our data across topics as well as how to create and maintain topics. This includes how partitions fit into our design considerations and how we can view our data on the brokers. All of this information will help us as we also look at how to make a topic update data rather than appending it to a log.

7.1 Topics

To quickly refresh our memory, it is important to know that a topic is a non-concrete concept rather than a physical structure. It does not usually exist on only one broker. Most applications consuming Kafka data view that data as being in a single topic; no other details are needed for them to subscribe. However, behind the topic name are one or more partitions that actually hold the data [1]. Kafka writes the data that makes up a topic in the cluster to logs, which are written to the broker filesystems.

Figure 7.1 Example topic with partitions

Figure 7.1 shows partitions that make up one topic named kinaction _helloworld. A single partition’s copy is not split between brokers and has a physical footprint on each disk. Figure 7.1 also shows how those partitions are made up of messages that are sent to the topic.

7.1.1 Topic-creation options

7.1.2 Replication factors

7.2 Partitions

7.2.1 Partition location

7.2.2 Viewing our logs

7.3 Testing with EmbeddedKafkaCluster

7.3.1 Using Kafka Testcontainers

7.4 Topic compaction

Summary

References