So far we’ve looked into the concepts of differential privacy (including the centralized, DP, and the local, LDP, versions) and their applications in developing privacy-preserving query-processing and machine learning (ML) algorithms. As you saw, the idea of DP is to add noise to the query results (without disturbing their original properties) such that the results can assure the privacy of the individuals while satisfying the utility of the application.
But sometimes data users may request the original data to utilize it locally and directly, perhaps to develop new queries and analysis procedures. Privacy-preserving data-sharing methods can be used for such purposes. This chapter will look into synthetic data generation—a promising solution for data sharing—which generates synthetic yet representative data that can be shared among multiple parties safely and securely. The idea of synthetic data generation is to artificially generate data that has distribution and properties similar to the original data. And because it is artificially produced, we do not have to worry about privacy concerns.