chapter seventeen

17 Crafting Model Character and Products

This chapter covers

How character training shapes model personality
Mechanistic tools for personality control
What model specifications are and why they matter
The intersection of RLHF with product development

Frontiers in RLHF and post-training show how these techniques are used within companies to make leading products. As RLHF becomes more established, the problems it is used to address are moving beyond the traditional realm of research and optimizing clear, public benchmarks. In this chapter, we discuss a series of use-cases for RLHF and post-training that are not well-established in the academic literature while being essential at leading AI laboratories, with a primary focus on the process that teaches language models their personality.

17.1 Character Training

17 Crafting Model Character and Products

This chapter covers

17.1 Character Training

17.1.1 Persona Vectors

17.1.2 The Assistant Axis

17.1.3 Persona Subnetworks

17.2 Model Specifications

17.3 Product Cycles and What’s Next for RLHF

Summary