Chapter 3. Data serialization—working with text and beyond
This chapter covers
- Working with text, XML, and JSON
- Understanding SequenceFile, Avro, Protocol Buffers, and Parquet
- Working with custom data formats
MapReduce offers straightforward, well-documented support for working with simple data formats such as log files. But MapReduce has evolved beyond log files to more sophisticated data-serialization formats—such as text, XML, and JSON—to the point where its documentation and built-in support runs dry. The goal of this chapter is to document how you can work with common data-serialization formats, as well as to examine more structured serialization formats and compare their fitness for use with MapReduce.