chapter four

4 Zero-shot probabilistic forecasting with Lag-Llama

 

This chapter covers

  • Exploring the architecture of Lag-Llama
  • Forecasting with Lag-Llama
  • Fine-tuning Lag-Llama

In chapter 3, we explored TimeGPT, a proprietary foundation model developed by Nixtla. Although it comes with an API that is easy and intuitive to use, it will eventually be a paid solution, which may deter some practitioners from using it.

Thus, this chapter explores Lag-Llama, an open source foundation model published around the time TimeGPT was released. On top of being an open source model, it has key differences from TimeGPT:

  • At this time of writing, using Lag-Llama requires cloning the code base, so it’s used mostly for quick proof-of-concept or research projects. No Python package or API is available for interacting with the model.
  • Lag-Llama supports only univariate forecasting, so only one series at a time can be predicted, and no exogenous features can be included. Although technically, anomaly detection can be done with Lag-Llama, I don’t cover it here because this model is not meant to be used in production.

Now that we have a general idea of what Lag-Llama can do, let’s explore it in detail and discover its architecture. This step is crucial because if we understand a model’s architecture, we can understand its hyperparameters and tune them for our scenario, leading to better results.

4.1 Exploring Lag-Llama

4.1.1 Viewing the architecture of Lag-Llama

4.1.2 Pretraining Lag-Llama

4.2 Forecasting with Lag-Llama

4.2.1 Setting up Lag-Llama

4.2.2 Zero-shot forecasting with Lag-Llama

4.2.3 Changing the context length in Lag-Llama

4.3 Fine-tuning Lag-Llama

4.3.1 Handling initial setup

4.3.2 Reading and splitting the data in Colab

4.3.3 Launching the fine-tuning procedure

4.3.4 Forecasting with a fine-tuned model

4.3.5 Evaluating the fine-tuned model

4.4 Model comparison table

4.5 Next steps