3 Data modeling

 

This chapter covers

  • Modeling data as a fundamental analytical activity
  • How to define business entities from raw data
  • How to structure a data model to best suit the analytical question

As an analyst, you will find yourself applying the same logic to raw data over and over again. For example, every time you calculate revenue, you might need to remember to remove internal money transfers between departments. Or when you look at customer spending, you might need to exclude a certain customer because they operate differently. Whenever these business rules need to be applied constantly to ensure data is accurate, it is a good opportunity to build a data model.

A data model is a dataset created from raw data that has been cleaned, with specific business rules built into it. Creating reusable data models will save you time and maintenance headaches in the future. Data modeling also forces you to think deeply about your or your stakeholder’s question, which leads to a more valuable answer.

3.1 The importance of data modeling

3.1.1 Common data modeling tasks

3.2 Project 2: Who are your customers?

3.2.1 Problem statement

3.2.2 Data dictionary

3.2.3 Desired outcomes

3.2.4 Required tools

3.3 Planning our approach to customer data modeling

3.3.1 Applying the results-driven process to data modeling

3.3.2 Questions to consider

3.4 An example solution: Identifying customers from transactional data

3.4.1 Developing an action plan

3.4.2 Exploring, extracting, and combining multiple sources of data

3.4.3 Applying entity resolution to deduplicate records

3.4.4 Conclusions and recommendations

3.5 Closing thoughts on data modeling

3.5.1 Data modeling skills for any project

Summary