4 Importing Data

 

This chapter covers

  • Reading data from text files
  • Reading data from tabular data files like csv, xlsx
  • Exporting data to text files and tabular data files.

The first step of data science and machine learning is importing data to the coding environment. Next comes analysis, visualization and feature engineering steps. Data is stored in very different formats so your first mission will be to retrieve data from these formats. These file formats range from very simple files like csv, to very complex ones like HDF5.

Scope of the project in this chapter is limited to text files and tabular data files like csv and xlsx. On the other hand, there are many more file types you will encounter in real-life projects. For the sake of completeness, we have included other file types separately in Appendix B. I strongly suggest going through Appendix B after studying this chapter.

Although base Julia functions are very useful to read data from files, there are very useful packages for writing to and importing data from different file formats. In this chapter and in Appendix B, we will use the most common packages to read data from and write data to files.

4.1 Flat Files

The most common type of file format you will be importing data from are flat files (text files, delimited files or spreadsheets) where data is stored in two dimensions as plain text or tables. Text files keep the data as text as the name suggests and mostly contain string data.

4.1.1 Text Files

4.1.2 Delimited Files

4.1.3 Excel Files

4.2 Summary