18 Product, UX, and Model Character
This chapter covers
- How character training shapes model personality
- What model specifications are and why they matter
- The intersection of RLHF with product development
Frontiers in RLHF and post-training show how these techniques are used within companies to make leading products. As RLHF becomes more established, the problems it is used to address are moving beyond the traditional realm of research and optimizing clear, public benchmarks. In this chapter, we discuss a series of use-cases for RLHF and post-training that are not well-established in the academic literature while being essential at leading AI laboratories.
18.1 Character Training
Character training is the subset of post-training designed around crafting traits within a model to tweak the personality or manner of its response, rather than the content [1]. Character training, while being important to the user experience within language model chatbots, is largely unexplored in the public domain. The default way for users to change a model’s behavior is to write a prompt describing the change, but character training with fine-tuning is shown to be more robust than prompting [1] (and this training also outperforms a newer method for manipulating models without taking gradient updates or passing in input context, Activation Steering [2]).