When dealing with big data, persistence is of paramount importance. We want to be able to access—to read and write—data as fast as possible, preferably from many parallel processes. We also want persistent representations that are compact because storing large amounts of data can be expensive.
In this chapter, we will consider several approaches to make persistent storage of data more efficient. We will start with a short discussion of fsspec, a library that abstracts access to file systems, both local and remote. While fsspec isn’t directly involved in performance problems, it is a modern library used by many applica-tions to deal with storage systems, and its use is recurrent in efficient storage implementations.