Chapter 20. Basic file wrangling

 

This chapter covers

  • Moving and renaming files
  • Compressing and encrypting files
  • Selectively deleting files

This chapter deals with the basic operations you can use when you have an ever-increasing collection of files to manage. Those files might be log files, or they might be from a regular data feed, but whatever their source, you can’t simply discard them immediately. How do you save them, manage them, and ultimately dispose of them according to a plan, but without manual intervention?

20.1. The problem: The never-ending flow of data files

Many systems generate a continuous series of data files. These files might be the log files from an e-commerce server or a regular process; they might be a nightly feed of product information from a server; they might be automated feeds of items for online advertising; historical data of stock trades; or they might come from a thousand other sources. They’re often flat text files, uncompressed, with raw data that’s either an input or a byproduct of other processes. In spite of their humble nature, however, the data they contain has some potential value, so the files can’t be discarded at the end of the day—which means that every day, their numbers grow. Over time, files accumulate until dealing with them manually becomes unworkable and until the amount of storage they consume becomes unacceptable.

20.2. Scenario: The product feed from hell

20.3. More organization

20.4. Saving storage space: Compression and grooming

Summary

sitemap