Use pandas to find most common genres in a movie excel sheet
Level: Advanced (score: 4)
Another pandas
Bite: we took some fake movie data from Mockaroo, an awesome fake data service, and dumped it in an Excel file.
Here is how it looks:
Complete group_by_genre
below doing the following:
- Load the data in a
pandas DataFrame
usingread_excel
. Useskiprows
andusecols
to get the table of genres and movies (header
did not work here). - Split the genres, which are now separated by pipe (|) into separate rows. This is not trivial so we provided the
explode
function mentioned in this article. - Next filter out the 5 rows without a genre = (no genres listed). At this point your
DataFrame
should have ashape
of (rows, columns):(2024, 2)
- Lastly group the
DataFrame
by genre, counting the number of movies for each. Sort the resultingDataFrame
on this count descending. This is how it should look: you should see Drama at the top and IMAX at the bottom.
Good luck, have fun and remember: keep calm and code in Python and pandas
!