appendix-c

Appendix C. Practical Issues

This appendix covers practical considerations for running post-training experiments at scale. This takes the form of a list of lessons, rather than a coherent narrative.

C.1 Compute Costs of Post-Training

There are two different ways of scoping costs for post-training runs. The largest cost is in developing the recipe, which can easily be 10 to 100X the compute of the final few training runs. The secondary costs, which are easier to measure, are the costs to thoroughly apply a recipe, which entails multiple seeds, careful evaluation, potential engineering headaches, etc.

For the first cost, to develop a post-training recipe like Tülu 3 [1], the team ran on the order of thousands of experiments and evaluations at the 7B scale before having the final model.

For final runs, the Olmo 3 report has a detailed accounting of what is involved in training the final 32B Think model [2]:

Appendix C. Practical Issues

C.1 Compute Costs of Post-Training

C.2 Evaluation Variance

C.3 Managing Training Performance Variance

C.4 Identifying Bad Training Jobs