Chapter 6. Using NoSQL to manage big data
This chapter covers
- What is a big data NoSQL solution?
- Classifying big data problems
- The challenges of distributed computing for big data
- How NoSQL handles big data
By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.
US Federal Government, “Big Data Research and Development Initiative”
Have you ever wanted to analyze a large amount of data gathered from log files or files you’ve found on the web? The need to quickly analyze large volumes of data is the number-one reason organizations leave the world of single-processor RDBMSs and move toward NoSQL solutions. You may recall our discussion in chapter 1 on the key business drivers: volume, velocity, variability, and agility. The first two, volume and velocity, are the most relevant to big data problems.
Twenty years ago, companies managed datasets that contained approximately a million internal sales transactions, stored on a single processor in a relational database. As organizations generated more data from internal and external sources, datasets expanded to billions and trillions of items. The amount of data made it difficult for organizations to continue to use a single system to process this data. They had to learn how to distribute the tasks among many processors. This is what is known as a big data problem.