33 The Movies Dataset
In this capstone, you will:
- Define ordered sequences of elements
- Transform and count the items of a list
- Find the minimum and maximum elements according to specific features
- Filter items based on their characteristics and selecting them based on their position
- Sort lists and produce string representation for them.
In this capstone, you’ll analyze data for more than 45000 movies. The information is a subset of a popular and publicly accessible dataset called “The movies Dataset” by Rounak Banik. On its website, you can find its latest version as well as an extensive description of its content:
These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts and vote averages.
This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website.