Chapter 6. Using NoSQL to manage big data

This chapter covers

What is a big data NoSQL solution?
Classifying big data problems
The challenges of distributed computing for big data
How NoSQL handles big data

By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.

US Federal Government, “Big Data Research and Development Initiative”

Have you ever wanted to analyze a large amount of data gathered from log files or files you’ve found on the web? The need to quickly analyze large volumes of data is the number-one reason organizations leave the world of single-processor RDBMSs and move toward NoSQL solutions. You may recall our discussion in chapter 1 on the key business drivers: volume, velocity, variability, and agility. The first two, volume and velocity, are the most relevant to big data problems.

Twenty years ago, companies managed datasets that contained approximately a million internal sales transactions, stored on a single processor in a relational database. As organizations generated more data from internal and external sources, datasets expanded to billions and trillions of items. The amount of data made it difficult for organizations to continue to use a single system to process this data. They had to learn how to distribute the tasks among many processors. This is what is known as a big data problem.

6.1. What is a big data NoSQL solution?

6.2. Getting linear scaling in your data center

6.3. Understanding linear scalability and expressivity

Chapter 6. Using NoSQL to manage big data

This chapter covers

6.1. What is a big data NoSQL solution?

6.2. Getting linear scaling in your data center

6.3. Understanding linear scalability and expressivity

6.4. Understanding the types of big data problems

6.5. Analyzing big data with a shared-nothing architecture

6.6. Choosing distribution models: master-slave versus peer-to-peer

6.7. Using MapReduce to transform your data over distributed systems

6.8. Four ways that NoSQL systems handle big data problems

6.9. Case study: event log processing with Apache Flume

6.10. Case study: computer-aided discovery of health care fraud

6.11. Summary

6.12. Further reading

Chapter 6. Using NoSQL to manage big data

This chapter covers

6.1. What is a big data NoSQL solution?

6.2. Getting linear scaling in your data center

6.3. Understanding linear scalability and expressivity

6.4. Understanding the types of big data problems

6.5. Analyzing big data with a shared-nothing architecture

6.6. Choosing distribution models: master-slave versus peer-to-peer

6.7. Using MapReduce to transform your data over distributed systems

6.8. Four ways that NoSQL systems handle big data problems

6.9. Case study: event log processing with Apache Flume

6.10. Case study: computer-aided discovery of health care fraud

6.11. Summary

6.12. Further reading

Unable to load book!