Part 3. Exploring data
In order to choose and construct useful models in a data science project you need to get to know the data. What bits of data do you have? What problems might there be? Given the nature of the data, what approaches might work best?
In the last chapter of The Quick Python Book, 3rd edition I introduce a key Python tool for exploring data, Jupyter notebook. Based on an earlier project called IPython, Jupyter notebooks are a daily staple for many data professionals.
It’s hard to explain all of the features that make Jupyter notebooks so useful for data exploration – they execute Python code, they make testing easy to write chunks of Python, then execute them on your data, see what the result is, and adapt it as needed. Jupyter works well with the excellent Python data handling framework, Pandas, as well as various graphing tools. Jupyter notebooks are web-based, it’s to share them with others, and they can even be used as a presentation tool.
With all of these features in mind, I’ve selected this chapter to give an idea of ways to explore data as part of a data science project.