Traditional databases employ a process referred to as schema-on-write, where the table’s columns, rows, and types are all defined before any data can be written into the table. This ensures that the data conforms to a predetermined specification and the consuming clients can access the schema information directly from the database itself, which enables them to determine the basic structure of the records they are processing.
Apache Pulsar messages are stored as unstructured byte arrays, and the structure is applied to this data only when it’s read. This approach is referred to as schema-on-read and was first popularized by Hadoop and NoSQL databases. While the schema-on-read approach makes it easier to ingest and process new and dynamic data sources on the fly, it does have some drawbacks, including the lack of a metastore that clients can access to determine the schema for the Pulsar topic they are consuming from.