chapter twenty

20 Data masking

 

One of the duties of a DBAs is to keep the data within a database away from prying eyes. As a best practice we, data keepers, should provide guidance to data owners so real production data won’t end up in the wrong hands. Data privacy should always be kept in the forefront of your mind.

On one hand, we have security measures like logins and users to define who has access to the data. We talked about these in chapter 9. But it’s quite common to receive requests for a copy of the production database. For instance, the development team might tell us, "We need the production database to find a bug". Or the test team might say, "We want to test this new feature but we need more data, and real data would be great."

[Note]  Note

SQL Server has a feature called "Dynamic Data Masking" which is not what dbatools offers so won’t be covered here. To make it clear, data masking on the dbatools context is related with the anonimization of the data.

Would you like to have your home address, phone number, and other personal data exposed or accessible by others when it’s not needed?

How do you deal with these kinds of requests for personal data when exposing that data is not absolutely necessary? Do you have a process in place? When talking about third-party databases, do you ask your clients if they have a process to anonymize the data? If not, do you suggest a way of doing it?

20.1 A common approach

20.2 The dbatools approach

20.2.1 Generating random data with dbatools

20.3 The process

20.3.1 Finding potential PII data

20.3.2 Generating a Masking config file

20.3.3 Anonymize our table’s data

20.3.4 Validate data masking configuration file

20.4 Commands we’ve learnt

20.5 Hands-On Lab