concept GridFS in category mongoDB

appears as: GridFS, GridFS
MongoDB in Action, Second Edition: Covers MongoDB version 3.0

This is an excerpt from Manning's book MongoDB in Action, Second Edition: Covers MongoDB version 3.0.

The first uses one document per file and is best for smaller binary objects. If you need to catalog a large number of thumbnails or binary MD5s, using single-document binary storage can make life much easier. On the other hand, you might want to store large images or audio files. In this case, GridFS, a MongoDB API for storing binary objects of any size, would be a better choice. In the next two sections, you’ll see complete examples of both storage techniques.

The term GridFS may lead to confusion, so two clarifications are worth making right off the bat. The first is that GridFS isn’t an intrinsic feature of MongoDB. As mentioned, it’s a convention that all the official drivers (and some tools) use to manage large binary objects in the database. Second, it’s important to clarify that GridFS doesn’t have the rich semantics of bona fide filesystems. For instance, there’s no protocol for locking and concurrency, and this limits the GridFS interface to simple put, get, and delete operations. This means that if you want to update a file, you need to delete it and then put the new version.

GridFS works by dividing a large file into small, 255 KB chunks and then storing each chunk as a separate document—versions prior to MongoDB v2.4.10 use 256 KB chunks. By default, these chunks are stored in a collection called fs.chunks. Once the chunks are written, the file’s metadata is stored in a single document in another collection called fs.files. Figure C.1 contains a simplistic illustration of this process applied to a theoretical 1 MB file called canyon.jpg. Note that the use of the term chunks in the context of GridFS isn’t related to the use of the term chunks in the context of sharding.

Figure C.1. Storing a file with GridFS using 256 KB chunks on a MongoDB server prior to v2.4.10

That should be enough theory to use GridFS.[2] Next we’ll see GridFS in practice through the Ruby GridFS API and the mongofiles utility.

$ mongofiles --help
Usage:
  mongofiles <options> <command> <filename or _id>

Manipulate gridfs files using the command line.

Possible commands include:
    list      - list all files; 'filename' is an optional prefix which listed
                filenames must begin with
    search    - search all files; 'filename' is a substring which listed
                filenames must contain
    put       - add a file with filename 'filename'
    get       - get a file with filename 'filename'
    get_id    - get a file with the given '_id'
    delete    - delete all files with filename 'filename'
    delete_id - delete a file with the given '_id'

See http://docs.mongodb.org/manual/reference/program/mongofiles/ for more information.

general options:
      --help                     print usage
      --version                  print the tool version and exit

verbosity options:
  -v, --verbose      more detailed log output (include multiple times for more
                     verbosity, e.g. -vvvvv)
      --quiet        hide all log output

connection options:
  -h, --host=         mongodb host to connect to (setname/host1,host2 for
                      replica sets)
      --port=         server port (can also use --host hostname:port)

authentication options:
  -u, --username=                username for authentication
  -p, --password=                password for authentication

      --authenticationDatabase=  database that holds the user's credentials
      --authenticationMechanism= authentication mechanism to use

storage options:
  -d, --db=                      database to use (default is 'test')
  -l, --local=                   local filename for put|get
  -t, --type=                    content/MIME type for put (optional)
  -r, --replace                  remove other files with same name after put
      --prefix=                  GridFS prefix to use (default is 'fs')
      --writeConcern=    write concern options e.g. --writeConcern majority,
                         --writeConcern '{w: 3, wtimeout: 500, fsync: true, j:
                         true}' (defaults to 'majority')
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest