appendix Software installation
To run the code notebooks and follow along with the examples in Data Science with Python and Dask, you should have the following software installed on your system:
- Python 2.7.14 or above or Python 3.6.5 or above (Python 3.6.5 or above is strongly recommended)
- The following Python packages:
- IPython
- Jupyter
- Dask (version 1.0.0 or higher)
- Dask ML
- NLTK
- Holoviews
- Geoviews
- Graphviz
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Bokeh
- PyArrow
- SQLAlchemy
- Dill
The easiest way to install and maintain all necessary Python packages is to download the free Python distribution, Anaconda, available at www.anaconda.com/download. The Anaconda distribution is available for Windows, macOS, and most major Linux distributions. If you’ve installed Anaconda, all the required packages will be included with the installer except for graphviz and pyarrow. To install graphviz and pyarrow, follow the directions in section A.1. Otherwise, if you wish to install all packages from scratch, please follow the directions in section A.2.
Installing additional packages with Anaconda
If you’ve already installed the Anaconda distribution, you will need to install graphviz and pyarrow. If you’ve set up a virtual environment specifically for working with the examples, make sure you activate it before running the installation commands. Open a command prompt or terminal window and type the following commands.