1 The Traveling Diabetes Clinic: A first take at the problem
“Essentially, all models are wrong, but some are useful”
- George E. P. Box, British Mathematician
This chapter covers:
- The pandas library and how to use it in reading and manipulating data
- The scikit-learn library and how to use it to train ML models
If we want to solidify the reason behind knowing the inner workings of machine learning, there is nothing better than working through a concrete example ourselves. In part 0, we’ll work through the Traveling Diabetes Clinic problem, which is an example of a classification problem. We’ll start with high-level and black box solution and then gradually dive into a deeper one. This will allow us to see how a deeper understanding can give us better solutions.
1.1 The Traveling Diabetes Clinic Problem
Diabetes is a serious chronic disease in which glucose, the main source of energy for the human body, accumulates in the bloodstream without being consumed – hence, it becomes toxic rather than energetic. It's estimated that around 30 million people in the United States alone have diabetes, and about 24% of those people are undiagnosed. In order to identify and address those undiagnosed patients, a group of doctors decided to initiate the Traveling Diabetes Clinic project.