Appendix C. Binary data and GridFS
For storing images, thumbnails, audio, and other binary files, many applications rely on the filesystem only. Although filesystems provide fast access to files, filesystem storage can also lead to organizational chaos. Consider that most filesystems limit the number of files per directory. If you have millions of files to keep track of, you need to devise a strategy for organizing files into multiple directories. Another difficulty involves metadata. Because the file metadata is still stored in a database, performing an accurate backup of the files and their metadata can be incredibly complicated.
For certain use cases, it may make sense to store files in the database itself because doing so simplifies file organization and backup. In MongoDB, you can use the BSON binary type to store any kind of binary data. This data type corresponds to the RDBMS BLOB (binary large object) type, and it’s the basis for two flavors of binary object storage provided by MongoDB.
The first uses one document per file and is best for smaller binary objects. If you need to catalog a large number of thumbnails or binary MD5s, using single-document binary storage can make life much easier. On the other hand, you might want to store large images or audio files. In this case, GridFS, a MongoDB API for storing binary objects of any size, would be a better choice. In the next two sections, you’ll see complete examples of both storage techniques.