4 Securing GenAI
This chapter covers
- Why LLMs cannot tell data from instructions, and what breaks because of it
- Real attacks at every capability stage: injection, exfiltration, poisoning, credential theft
- How each new capability (data access, write access, agency) expands the attack surface
- Matching controls to exposure: policy engines, tiered approval, sandboxing
In traditional software, the boundary between “what to do” and “what to work with” is enforced through technical rules: specific characters, structured formats, or clearly separated fields. When attackers blur that line (as in SQL injection), engineers can fix it by enforcing those boundaries more strictly: validating inputs, using separate channels for commands and data, or blocking dangerous characters. The fix works because the system follows rigid rules about what counts as a command versus what counts as data.
LLMs don’t follow those rules. They interpret natural language for meaning, not structure. When you send an LLM the instruction “Summarize this email” followed by the email text, both the instruction and the content arrive as plain text. The model decides which part is a command and which part is data based on context and interpretation; not because one is in a special field or uses different characters. There’s no reliable technical boundary to reinforce, which is why the injection defenses that work for databases and web forms don’t translate to language models.