Lesson 26. Capstone: Processing binary files and book data

 

This capstone covers

  • Learning about a unique binary format used by libraries
  • Writing tools to bulk-process binary data by using ByteString
  • Working with Unicode data by using the Text type
  • Structuring a large program performing a complicated I/O task

In this capstone, you’re going to use the data on books created by libraries to make a simple HTML document. Libraries collectively spend a huge amount of time cataloging every possible book in existence. Thankfully, much of this data is freely available to anyone who wants to explore it. Harvard Library alone has released 12 million book records to be used for free by the public (http://library.harvard.edu/open-metadata). The Open Library project contains millions of additional records for use (https://archive.org/details/ol_data).

26.1. Working with book data

26.2. Working with MARC records

26.3. Putting it all together

Summary