chapter seven

7 Generating Python code

 

This chapter covers

  • The role specialized open source LLMs play in code generation
  • Optimizing and quantizing these LLMs to improve inference performance
  • Running these LLMs on a laptop after 4-bit quantization

In this chapter, we’ll apply the concepts covered so far to real-world, domain-specific LLMs, including optimization and quantization. We’ll focus on models tuned for Python code generation and programming assistance. Although most open source models in this category support multiple languages, we’ll use Python so you can readily judge the quality of the outputs.

7.1 Using Transformers to generate code

Beyond closed-source, proprietary LLM-based code assistants, several popular open-source LLMs designed for coding tasks have been released. In this chapter, we’ll get into hands-on experimentation with some of these options, look at their pros and cons, and see how to optimize and, if needed, quantize them to run inference on commodity hardware (ideally on our laptops) while maintaining good performance. First, though, we’ll consider the present and future of human programmers in this era of coding assistants.

7.2 Generating Python code with a Transformer architecture

7.2.1 Python code generation with CodeGen

7.2.2 Using ONNX with models not supported by Optimum

7.2.3 Model evaluation

7.2.4 Python code generation with better models

7.3 Coding assistance on commodity hardware

Summary