List of Figures

 

Chapter 1. The case for the digital Babel fish

Figure 1.1. Computer programs usually specialize in reading and interpreting only one file format (or family of formats). To deal with .pdf files, .psd files, and the like, you’d purchase Adobe products. If you needed to deal with Microsoft Office files (.doc, .xls, and so on), you’d turn to Microsoft products or other office programs that support these Microsoft formats. Few programs can understand all of these formats.

Figure 1.2. Seven top-level MIME types hierarchy as defined by IANA’s RFC 2046 (the eighth type model was added later in RFC 2077). Top-level types can have subtypes (children), and so on, as new media types are defined over the years. The use of the multiplicity denotes that multiple children may be present at the same level in the hierarchy, and the ellipses indicate that the remainder of the hierarchy has been elided in favor of brevity.

Figure 1.3. A snippet of HTML (at bottom) for the ESPN.com home page. Note the top-level category headings for sports (All Sports, Commentary, Page 2) are all surrounded by <li> HTML tags that are styled by a particular CSS class. This type of structural information about a content type can be exploited and codified using the notion of structured text.