4 Exploring ONNX
This chapter covers
- The ONNX standard format
- The ONNX runtime
- How ONNX can be useful for LLMs, with or without hardware acceleration
It introduces you to the ONNX framework, which plays a very important role in model optimization, quantization and portability across frameworks and hardware vendors. Unless you are already familiar with ONNX, please dedicate all the time you need to assimilate the concepts explained in this chapter as they will be used heavily across the next chapters.
4.1 The ONNX format
ONNX (which stands for Open Neural Network Exchange, https://onnx.ai/) is an open standard for ML interoperability. Released for the first time in 2017, it is now a graduate project of the LFAI (Linux Foundation for Artificial Intelligence, https://lfaidata.foundation/). The aim of this initiative is making interoperability easier across diverse ML/DL frameworks (it supports all those currently available, such as Keras, TensorFlow, PyTorch, SciKit Learn, XGBoost and many others) and performance maximization across diverse hardware accelerators (many supported, not just NVIDIA, also Intel’s OpenVINO and Habana, Qualcomm, Apache TVM, Hugging Face’s Optimum and others). Figure 4.1 shows a very high-level overview of ONNX.
Figure 4.1 ONNX overview
