chapter one

1 Evaluations and alignments for AI

 

This chapter covers

  • Understanding evaluation and alignment
  • AI evaluation methods and challenges
  • Three pillars of alignment—personality, policy, principles

Evaluation and alignment are two approaches that determine and ensure whether AI models and systems function as intended. Evaluation measures what a model or system actually does: its accuracy, reliability, and behavior across edge cases. Alignment ensures the model or system performs as intended: following instructions, respecting constraints, and behaving consistently with human values. We apply evaluation and alignment at every level of AI system development, from base language models to post-trained models and complete AI systems that combine models with memory, planning, tools, and orchestration logic. Together, evaluation and alignment provide the feedback loop that measures behavior and applies corrections to make AI systems production-ready.

1.1 AI evaluations

1.1.1 Verifiable tasks

1.1.2 Open-ended tasks

1.1.3 Hallucinations

1.2 AI alignments

1.2.1 Three pillars of alignment

1.3 Practical evaluation and alignment for AI engineering

1.3.1 The iterative mental model

1.3.2 Learning from seminal papers

1.4 What you need to follow along

1.5 Summary