Chapter 22. Data over the network

 

This chapter covers

  • Fetching files via FTP/SFTP, SSH/SCP, and HTTPS
  • Getting data via APIs
  • Structured data file formats: JSON and XML
  • Scraping data

You’ve seen how to deal with text-based data files. In this chapter, you use Python to move data files over the network. In some cases, those files might be text or spreadsheet files, as discussed in chapter 21, but in other cases, they might be in more structured formats and served from REST or SOAP application programming interfaces (APIs). Sometimes, getting the data may mean scraping it from a website. This chapter discusses all of these situations and shows some common use cases.

22.1. Fetching files

Before you can do anything with data files, you have to get them. Sometimes, this process is very easy, such as manually downloading a single zip archive, or maybe the files have been pushed to your machine from somewhere else. Quite often, however, the process is more involved. Maybe a large number of files needs to be retrieved from a remote server, files need to be retrieved regularly, or the retrieval process is sufficiently complex to be a pain to do manually. In any of those cases, you might well want to automate fetching the data files with Python.

22.2. Fetching data via an API

22.3. Structured data formats

22.4. Scraping web data

Summary

sitemap