33 The Movies Dataset

chapter thirty three

In this capstone, you will:

Define ordered sequences of elements
Transform and count the items of a list
Find the minimum and maximum elements according to specific features
Filter items based on their characteristics and selecting them based on their position
Sort lists and produce string representation for them.

In this capstone, you’ll analyze data for more than 45000 movies. The information is a subset of a popular and publicly accessible dataset called “The movies Dataset” by Rounak Banik. On its website, you can find its latest version as well as an extensive description of its content:

These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.

This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.

33 The Movies Dataset

In this capstone, you will:

33.1 Download the base project

33.2 Parsing a Movie entity

33.3 Printing Query Results

33.4 Querying the Movie Dataset

33.5 Summary