c-data-formats

C Data formats

In this appendix:

The CSV format and how to read and write CSV
The JSON format and how to read and write JSON

There is no shortage of file formats you can use to represent data: Extensible Markup Language (.xml), Shapefiles (.shp, .shx, and .dbf), OpenDocument spreadsheets (.ods), Excel spreadsheets (.xls and .xlsx). Some projects define their own formats, and they can be either binary formats or text-based formats. Some data formats are machine-readable (we can tell computers how to read the data given the format), and some formats are thought to be data formats even though they aren’t (for example PDF).

In this book, we will mostly use two very popular data file formats: CSV files and JSON files. They are simple, readable, and widely supported. I believe these formats complement each other very well, and by knowing these two formats and the common pitfalls or tricks around them, you can accomplish a great deal.

C.1 CSV

The pure simplicity of a CSV file is at the same time its biggest strength and its biggest weakness. CSV stands for comma separated values and that’s all they really are — values separated by commas. The following is a perfectly valid CSV file consisting of two rows with three columns each:

first,second,third
one,two,three
a,b,c

Table C.1 is a tabular representation of the same CSV file.

Table C.1. Tabular representation of a CSV file

first	second	third
one	two	three
a	b	c

C Data formats

In this appendix:

C.1 CSV

Table C.1. Tabular representation of a CSV file

C.2 JSON