10 Handling errors in production
This chapter covers
- Transient and non-transient errors
- Retries and exponential backoffs
- Handling non-transient errors with dead-letter queues
- Best practices for implementing dead-letter queues
- Implementing exactly-once semantics with transactions
Streaming data pipelines process and replicate data from source to sink systems continuously and in real-time. Hopefully, your pipeline operates normally most of the time and does not run into any errors. What if, however, an external data source system returns data in an unexpected format, which breaks the streaming data pipeline? When your pipeline runs into errors, you must handle them. Since streaming data pipelines promise the continuous processing of data, the automated handling of errors is favored, as it can react to issues immediately and will probably reduce the downtime of the streaming data pipeline. Depending on the type of the error, the automated recovery is not always possible, though. Consequently, it’s important to know the types of errors that can occur in streaming data pipelines and how to best react to them.