chapter seventeen

17 Crafting Model Character and Products

 

This chapter covers

  • How character training shapes model personality
  • Mechanistic tools for personality control
  • What model specifications are and why they matter
  • The intersection of RLHF with product development

Frontiers in RLHF and post-training show how these techniques are used within companies to make leading products. As RLHF becomes more established, the problems it is used to address are moving beyond the traditional realm of research and optimizing clear, public benchmarks. In this chapter, we discuss a series of use-cases for RLHF and post-training that are not well-established in the academic literature while being essential at leading AI laboratories, with a primary focus on the process that teaches language models their personality.

17.1 Character Training

17.1.1 Persona Vectors

17.1.2 The Assistant Axis

17.1.3 Persona Subnetworks

17.2 Model Specifications

17.3 Product Cycles and What’s Next for RLHF

Summary