Part 4. Modeling and prediction
The “science” part of “data science” is the modelling, the part that creates the predictions, correlations, or classifications that form the information we hope to get out a data science project.
For the final chapter of this ebook, I’ve chosen a chapter from Real-world Machine Learning by Henrik Brink, Joseph W. Richards, and Mark Fetherolf on
“Modeling and prediction.” This chapter lays out the basics of machine learning modelling and explains the difference between supervised and unsupervised learning. From there it uses the Python package scikit-learn to demonstrate several common machine learning approaches, including logistic regression, sup- port vector machines, k-nearest neighbors, linear regression, and random forest algorithms to model problems like the survival chances of passengers on the
Titanic, number recognition, and mileage prediction.