chapter seven

7 Schema registry

This chapter covers

Using the Pulsar schema to simplify your microservice development
Understanding the different schema compatibility types
Using the LocalRunner class to run and debug your functions inside your IDE
Evolving a schema without impacting existing consumers

Traditional databases employ a process referred to as schema-on-write, where the table’s columns, rows, and types are all defined before any data can be written into the table. This ensures that the data conforms to a predetermined specification and the consuming clients can access the schema information directly from the database itself, which enables them to determine the basic structure of the records they are processing.

Apache Pulsar messages are stored as unstructured byte arrays, and the structure is applied to this data only when it’s read. This approach is referred to as schema-on-read and was first popularized by Hadoop and NoSQL databases. While the schema-on-read approach makes it easier to ingest and process new and dynamic data sources on the fly, it does have some drawbacks, including the lack of a metastore that clients can access to determine the schema for the Pulsar topic they are consuming from.

7.1 Microservice communication

7.1.1 Microservice APIs

7.1.2 The need for a schema registry

7.2 The Pulsar schema registry

7.2.1 Architecture

7.2.2 Schema versioning

7.2.3 Schema compatibility

7.2.4 Schema compatibility check strategies

7.3 Using the schema registry

7.3.1 Modelling the food order event in Avro

7.3.2 Producing food order events

7.3.3 Consuming the food order events

7.3.4 Complete example

7.4 Evolving the schema

Summary