chapter four
4 Zero-shot probabilistic forecasting with Lag-Llama
This chapter covers
- Exploring the architecture of Lag-Llama
- Forecasting with Lag-Llama
- Fine-tuning Lag-Llama
In chapter 3, we explored TimeGPT, a proprietary foundation model developed by Nixtla. Although it comes with an API that is easy and intuitive to use, it will eventually be a paid solution, which may deter some practitioners from using it.
Thus, this chapter explores Lag-Llama, an open source foundation model published around the time TimeGPT was released. On top of being an open source model, it has key differences from TimeGPT:
- At this time of writing, using Lag-Llama requires cloning the code base, so it’s used mostly for quick proof-of-concept or research projects. No Python package or API is available for interacting with the model.
- Lag-Llama supports only univariate forecasting, so only one series at a time can be predicted, and no exogenous features can be included. Although technically, anomaly detection can be done with Lag-Llama, I don’t cover it here because this model is not meant to be used in production.
Now that we have a general idea of what Lag-Llama can do, let’s explore it in detail and discover its architecture. This step is crucial because if we understand a model’s architecture, we can understand its hyperparameters and tune them for our scenario, leading to better results.