4 Downloading an LLM and Your First Conversation
This chapter covers
- Downloading your first AI model with ollama pull
- Having an interactive conversation with a local LLM
- Model sizes (2B, 7B, 13B, 70B) and their RAM requirements
- Comparing different model families (Gemma, Llama, Qwen, Mistral)
- Managing models with essential Ollama commands
At this point, Ollama is installed and the ollama serve process is running in the background. You have the player, but no song loaded yet. In this chapter, you will download an actual AI model, talk to it, and learn how to manage multiple models on your machine.
4.1 Downloading Your First Model
You will start with one small, reliable model before comparing alternatives. This keeps the first download manageable and gives you a known-good baseline for later chapters.
4.1.1 Choosing a starting model
There are dozens of open-source models available through Ollama. For your first download, you will use Gemma 3 with 4 billion parameters---a lightweight model created by Google DeepMind.
What does "4 billion parameters" mean? Parameters are the numerical values inside the neural network that the model learned during training. Think of them as the model's "brain cells." More parameters generally means smarter responses, but also more RAM usage and slower speed. Four billion is a great starting point because it runs smoothly on almost any modern laptop.