10 Handling errors in production

 

This chapter covers

  • Transient and non-transient errors
  • Retries and exponential backoffs
  • Handling non-transient errors with dead-letter queues
  • Best practices for implementing dead-letter queues
  • Implementing exactly-once semantics with transactions

Streaming data pipelines process and replicate data from source to sink systems continuously and in real-time. Hopefully, your pipeline operates normally most of the time and does not run into any errors. What if, however, an external data source system returns data in an unexpected format, which breaks the streaming data pipeline? When your pipeline runs into errors, you must handle them. Since streaming data pipelines promise the continuous processing of data, the automated handling of errors is favored, as it can react to issues immediately and will probably reduce the downtime of the streaming data pipeline. Depending on the type of the error, the automated recovery is not always possible, though. Consequently, it’s important to know the types of errors that can occur in streaming data pipelines and how to best react to them.

10.1 Transient vs. non-transient errors

10.2 Handling transient network issues in Kafka Streams applications

10.3 Handling non-transient errors with dead-letter queues

10.3.1 Reacting to records in dead-letter queues

10.3.2 Structuring dead-letter queues

10.4 Exactly-once delivery semantics

10.4.1 Transactions in Kafka

10.4.2 Enabling exactly-once semantics in Kafka Streams

10.5 Summary