chapter seventeen

17 Product, UX, and Model Character

This chapter covers

How character training shapes model personality
What model specifications are and why they matter
The intersection of RLHF with product development

Frontiers in RLHF and post-training show how these techniques are used within companies to make leading products. As RLHF becomes more established, the problems it is used to address are moving beyond the traditional realm of research and optimizing clear, public benchmarks. In this chapter, we discuss a series of use-cases for RLHF and post-training that are not well-established in the academic literature while being essential at leading AI laboratories.

17.1 Character Training

Character training is the subset of post-training designed around crafting traits within a model to tweak the personality or manner of its response, over the content [1]. Character training, while being important to the user experience within language model chatbots, is largely unexplored in the public domain. The default way for users to change a model’s behavior is to write a prompt describing the change, but character training with fine-tuning is shown to be more robust than prompting [1] (and this training also outperforms a newer method for manipulating models without taking gradient updates or passing in input context, Activation Steering [2], which has been applied to character traits specifically via persona vectors [3]).

17 Product, UX, and Model Character

This chapter covers

17.1 Character Training

17.2 Model Specifications

17.3 Product Cycles, UX, and RLHF

Summary