chapter four

4 Zero-Shot probabilistic forecasting with Lag-Llama

This chapter covers

Exploring the architecture of Lag-Llama
Forecasting with Lag-Llama
Fine-tuning Lag-Llama

In the previous chapter, we explored TimeGPT, a proprietary foundation model developed by Nixtla. While it comes with an API that is easy and intuitive to use, it will eventually be a paid solution, which might deter some practitioners from using it.

Thus, we now explore Lag-Llama, an open-source foundation model that quickly followed TimeGPT. On top of being an open-source model, there are more key differences to Lag-Llama when compared to TimeGPT.

On top of being open-source, at the time of writing, Lag-Llama can only be used by cloning the code base. This means that there is no Python package or API to interact with the model. Also, Lag-Llama only supports univariate forecasting, so only one series at a time can be predicted and no exogenous features can be included. Finally, anomaly detection is not an explicit functionality of Lag-Llama.

Now that we have a general idea of the capabilities of Lag-Llama, let’s explore it in more detail and discover its architecture.

4.1 Exploring Lag-Llama

As mentioned before, Lag-Llama is a probabilistic forecasting model, meaning that instead of outputting point forecasts, it outputs a distribution of possible future values [1].

4 Zero-Shot probabilistic forecasting with Lag-Llama

This chapter covers

4.1 Exploring Lag-Llama

4.1.1 Architecture of Lag-Llama

4.1.2 Pretraining Lag-Llama

4.2 Forecasting with Lag-Llama

4.2.1 Setting up Lag-Llama

4.2.2 Zero-shot forecasting with Lag-Llama

4.2.3 Changing the context length in Lag-Llama

4.3 Fine-tuning Lag-Llama

4.4 Next steps

4.5 Summary

4.6 References