chapter three

3 Data privacy and safety: Technical and legal controls

This chapter covers

Sources of bias in training data
Improving the safety of outputs from LLMs
Mitigating privacy risks with user inputs to LLMs
Data protection laws and their application to generative AI systems

In the previous chapter, we discussed how large language models (LLMs) are trained on massive datasets from the internet. In practice, that data is likely to contain personal information, bias, and other undesirable content. We also introduced the concept of post-training and the primary post-training techniques. While some LLM developers use the unrestricted nature of their models as a selling point, most major LLM providers have a set of policies around the kinds of content they don’t want the model to produce and are dedicating a great deal of effort to ensuring that their models follow those policies as closely as possible, through post-training and other methods. For example, commercial LLM providers don’t want LLMs to generate hate speech or discrimination because it could reflect poorly on the company in the eyes of consumers. Although these policies will vary depending on an organization's values and external pressures, improving an LLM's safety involves exercising control over the model’s generations, which requires technical interventions.

3.1 What’s in the training data?

3.1.1 Encoding bias

3.1.2 Linguistic diversity

3.1.3 Sensitive information

3.2 Safety-focused improvements for LLM generations

3.2.1 Post-processing detection algorithms

3.2.2 Content filtering or conditional pre-training

3.2.3 Safety post-training

3.2.4 Machine unlearning

3.3 Navigating user privacy and commercial risks

3.3.1 Inadvertent data leakage

3.3.2 Best practices when interacting with LLMs

3.4 Data protection and privacy in the age of AI

3.4.1 International standards and data protection laws

3.4.2 Are generative AI systems GDPR-compliant?

3.4.3 Privacy regulations in academia

3.4.4 Corporate policies

3.4.5 Governing data in an AI-driven world

3.5 Conclusion

3.6 Summary