concept large dataset in category algorithms

appears as: large dataset, large datasets
Algorithms and Data Structures for Massive Datasets MEAP V01

This is an excerpt from Manning's book Algorithms and Data Structures for Massive Datasets MEAP V01.

Figure 1.1: In this example, we build a (comment-id, frequency) hash table to help us eliminate duplicate comments. So for example, the comment identified by comment-id 36457 occurs 6 times in the dataset. We also build “keyword” hash tables, where, for each keyword of interest, we count how many times the keyword is mentioned in the comments of a particular article. So for example, the word ‘science’ is mentioned 21 times in the comments of the article identified by article-id 8999. For a large dataset of 3 billion comments, storing all these data structures can easily lead to needing dozens to a hundred of gigabytes of RAM memory.
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest