Movie data analysis
Level: Intermediate (score: 3)
In this Bite we are going to parse a csv movie dataset to identify the directors with the highest rated movies.
- Write
get_movies_by_director: usecsv.DictReaderto convert movie_metadata.csv into a(default)dictoflists of Movienamedtuples. Convert/filter the data:
- Only extract director_name, movie_title, title_year and imdb_score, ignoring movies without all of these fields.
- Type conversions:
title_year->int/imdb_score->float - Discard any movies older than 1960.
Here is an extract:
.... { 'Woody Allen': [ Movie(title='Midnight in Paris', year=2011, score=7.7), Movie(title='The Curse of the Jade Scorpion', year=2001, score=6.8), Movie(title='To Rome with Love', year=2012, score=6.3), .... ], ... } - Write the
calc_mean_scorehelper that takes a list of Movienamedtuples and calculates the mean IMDb score, returning the score rounded to 1 decimal place. - Complete
get_average_scoreswhich takes the directors data structure returned byget_movies_by_director(see 1.) and returns a list of tuples (director,average_score) ordered by highest score in descending order. Only take directors into account with >=MIN_MOVIES
See the tests for more info. This could be tough one, but we really hope you learn a thing or two. Good luck and keep calm and code in Python!