concept `large dataset` in category `algorithms`

appears as: large dataset, large datasets

Algorithms and Data Structures for Massive Datasets MEAP V01

This is an excerpt from Manning's book Algorithms and Data Structures for Massive Datasets MEAP V01. Login to get full access to this book.

Figure 1.1: In this example, we build a (comment-id, frequency) hash table to help us eliminate duplicate comments. So for example, the comment identified by comment-id 36457 occurs 6 times in the dataset. We also build “keyword” hash tables, where, for each keyword of interest, we count how many times the keyword is mentioned in the comments of a particular article. So for example, the word ‘science’ is mentioned 21 times in the comments of the article identified by article-id 8999. For a large dataset of 3 billion comments, storing all these data structures can easily lead to needing dozens to a hundred of gigabytes of RAM memory.

to see more go to 1 Introduction

concept large dataset in category algorithms

Algorithms and Data Structures for Massive Datasets MEAP V01

Unable to load book!

concept `large dataset` in category `algorithms`