Appendix B. Automating the web with scraping

This appendix covers

Creating structured data from web pages
Performing basic web scraping with cheerio
Handling dynamic content with jsdom
Parsing and outputting structured data

In the preceding chapter, you learned some general Node programming techniques, but now we’re going to start focusing on web development. Scraping the web is an ideal way to do this, because it requires a combination of server and client-side programming skills. Scraping is all about using programming techniques to make sense of web pages and transform them into structured data. Imagine you’re tasked with creating a new version of a book publisher’s website that’s currently just a set of old-fashioned, static HTML pages. You want to download the pages and analyze them to extract the titles, descriptions, authors, and prices for all the books. You don’t want to do this by hand, so you write a Node program to do it. This is web scraping.

Node is great at scraping because it strikes a perfect balance between browser-based technology and the power of general-purpose scripting languages. In this chapter, you’ll learn how to use HTML parsing libraries to extract useful data based on CSS selectors, and even to run dynamic web pages in a Node process.

Appendix B. Automating the web with scraping

This appendix covers

B.1. Understanding web scraping

B.2. Performing basic web scraping with cheerio

B.3. Handling dynamic content with jsdom

B.4. Making sense of raw data

B.5. Summary

Appendix B. Automating the web with scraping

This appendix covers

B.1. Understanding web scraping

B.2. Performing basic web scraping with cheerio

B.3. Handling dynamic content with jsdom

B.4. Making sense of raw data

B.5. Summary

Unable to load book!