8 Schema Management

 

This chapter covers:

  • Managing the bane of data warehousing - schema changes - in a cloud data platform
  • Understanding the differences between a “schema on read” approach vs an active schema management approach
  • Evaluating when to use “schema-as-a-contract approach” vs a “smart pipeline” approach
  • Using Spark to infer schemas in batch mode
  • Implementing a schema registry as part of a metadata layer
  • Using the operational metadata introduced in chapter 7 to manage schema changes more easily
  • Building resilient data pipelines that can manage schema changes automatically
  • Managing common schema changes as they relate to backward and forward compatibility
  • Managing schema changes through to the data warehouse consumption layer

In this chapter we tackle the age old problem of managing schema changes in a data system introduced when source data changes, exploring how the increase in usage of third party data sources, i.e. SaaS and the growing use of streaming data adds to the challenge. 

We will discuss how our cloud data platform design can be used to address these challenges - starting with leveraging the Schema Registry domain in the metadata layer introduced in Chapter 7 and tackling different approaches to updating schemas in the registry - from “do nothing and wait till something breaks” to “schema-as-a-contract” and “smart-pipelines”

8.1      Why Schema Management

 
 

8.1.1   Schema changes in a traditional data warehouse architecture

 
 

8.1.2   Schema-on-read approach

 
 

8.2      Schema Management Approaches

 
 
 
 

8.2.1   Schema as a contract

 

8.2.2   Schema management in the data platform

 
 

8.2.3   Monitoring schema changes

 
 
 

8.3      Schema Registry Implementation

 
 

8.3.1   Apache Avro schemas

 

8.3.2   Existing Schema Registry implementations

 
 
 

8.3.3   Schema Registry as a part of a Metadata layer

 
 
 

8.4      Schema Evolution Scenarios

 
 
 

8.4.1   Schema compatibility rules

 
 
 
 

8.7      Exercise Answers

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest