Chapter 1. Data science in a big data world
This chapter covers
- Defining data science and big data
- Recognizing the different types of data
- Gaining insight into the data science process
- Introducing the fields of data science and big data
- Working through examples of Hadoop
Big data is a blanket term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data management techniques such as, for example, the RDBMS (relational database management systems). The widely adopted RDBMS has long been regarded as a one-size-fits-all solution, but the demands of handling big data have shown otherwise. Data science involves using methods to analyze massive amounts of data and extract the knowledge it contains. You can think of the relationship between big data and data science as being like the relationship between crude oil and an oil refinery. Data science and big data evolved from statistics and traditional data management but are now considered to be distinct disciplines.
- Volume —How much data is there?
- Variety —How diverse are different types of data?
- Velocity —At what speed is new data generated?