chapter three

3 Data modeling

 

This chapter covers

  • Modeling data as a fundamental analytical activity
  • Defining business entities from raw data
  • Structuring a data model in a way best suited to the analytical question

As an analyst, you will find yourself applying the same logic to raw data over and over again. For example, when counting revenue, you might need to remember to remove internal money transfers between departments every time. Or when you look at customer spend, you might need to exclude a certain customer because they operate differently. Whenever these business rules need to constantly be applied to ensure data is accurate, it is a good opportunity to build a data model.

A data model is a dataset created from raw data that has been cleaned, with specific business rules built into it. Creating reusable data models will save you time and maintenance headaches in the future. Data modeling also forces you to think deeply about your, or your stakeholder’s, question, leading to a more valuable answer.

3.1 The importance of data modeling

3.1.1 Common data modeling tasks

3.2 Project 2: Who are your customers?

3.2.1 Problem statement

3.2.2 Data dictionary

3.2.3 Desired outcomes

3.2.4 Required tools

3.3 Planning our approach to customer data modeling

3.3.1 Applying the results-driven process to data modeling

3.3.2 Questions to consider

3.4 An example solution: Identifying customers from transactional data

3.4.1 Exploring, extracting, and combining multiple sources of data

3.4.2 Applying entity resolution to deduplicate records

3.4.3 Conclusions and recommendations

3.5 Closing thoughts on data modeling

3.5.1 Data modeling skills for any project

3.6 Summary