1 Understanding foundation models
This chapter covers
- Defining foundation models
- Exploring the Transformer architecture
- Advantages and drawbacks of using foundation models
- Overview of foundation models for time series forecasting
Foundation models represent a major shift in paradigm for the field of machine learning. Traditionally, we build data-specific models, meaning that each model is trained on a dataset specific to a particular scenario. Thus, the model specializes in a single use case. In another situation, another model would have to be trained with data specific to that situation.
At the time of writing this book, we are increasingly finding more areas to apply and interact with foundation models. Video meeting applications, like Microsoft Teams, now use foundation models to summarize the key points of a presentation. Canva, a company building web-based design tools, offers creating an image from a text input using the DALL-E model developed by OpenAI. Also, millions of people have interacted with ChatGPT, which, in the free version, uses the GPT-5 model to generate text and code. Finally, the company Toys R Us created a video ad using Sora, a foundation model that generates video from text [1]. Of course, in this book, we focus entirely on foundation models applied for time series forecasting, which itself can be applied in a wide range of applications such as weather forecasting, demand planning and more.