7 Controlling LLMs via the Ollama Python API
This chapter covers
- Writing your first Python script that talks to a local LLM
- Understanding message roles: system, user, and assistant
- Streaming responses for real-time token-by-token output
- Building a multi-turn chatbot that maintains conversation history
Everything you have done so far---installing Ollama, downloading models, setting up VS Code, creating a virtual environment---was preparation. In this chapter, you cross the line from AI user to AI programmer. You will write Python scripts that send instructions to an LLM and receive its responses, giving you programmatic control over a locally running AI model.
Make sure your virtual environment is activated (you should see (venv) in your terminal prompt) and that the Ollama service is running before you begin. This chapter uses the model gemma3:4b. If you have not pulled it yet, run ollama pull gemma3:4b in a terminal. If ollama serve reports that port 11434 is already in use, Ollama is already running and you can continue.
7.1 Your First Ollama Python Script
In this section, you will write a short Python script that sends a question to the Gemma 2 model and prints its answer. Do not worry if the code looks unfamiliar---we will break down every line afterward. The goal is to type the code exactly as shown, run it, and see the result. Understanding will follow.