Scrape best programming books
Level: Advanced (score: 4)
For this bite, you are going to scrape the books from 100 Best Programming Books of All Time. Only include the ones with the word python in their titles (case insensitive match).
Since the page gets generated via JavaScript, we will be providing you with the source and the code to load it up and bundle it into a BeautifulSoup
object for you. All you will have to do is scrape the necessary data and create Book objects from it.
The Book class
Create a class for the books which should have the following class variables:
- title:
string
as it appears on the page - author:
string
should be entered as lastname, firstname - year: four digit
integer
year that the book was published - rank:
integer
rank to be updated once the books have been sorted - rating:
float
as indicated on the page
When you print a Book
it should be formatted as follows:
[001] Python Tricks (2017) Bader, Dan 4.74
The load_data() function
With this function you will load the data from the html_file. This is where you will call the _get_soup()
function that has been provided for you.
- Loads the
soup
object - Extract the information from the
soup
object required to create theBook
instances - Returns a sorted list of
Book
objects
NOTE: If any of the required attributes is missing from any of the books, dump the book and don't include it.
SORTING:
Books
should be sorted descending byrating
and ascending byyear
,title
, and then byauthor
's last name; in that order. When sorting the titles, make sure to sort them with either.title()
,.lower()
, or.upper()
but take care not to change the original.RANKING: After the books have been sorted, the rank of each book needs to be updated to indicate this new sorting order.The
Book
object with the highest rating should be first and go down from there.
The display_books() function
With this function, you are simply going to print the specified books to the console. You will need to implement the following variables:
- books:
list
of all the sortedBooks
- limit:
integer
that indicates how many books to return, defaults to10
- year:
integer
indicating the oldest year to include, i.e. 2017, defaults toNone
If it's called with more books than are in the list, it should only display the max books that are available and not fail.
Sample call to display_books()
:
display_books(books, limit=5, year=2017)
[001] Python Tricks (2017) Bader, Dan 4.74 [002] Mastering Deep Learning Fundamentals with Python (2019) Wilson, Richard 4.7 [006] Python Programming (2019) Fedden, Antony Mc 4.68 [007] Python Programming (2019) Mining, Joseph 4.68 [009] A Smarter Way to Learn Python (2017) Myers, Mark 4.66
NOTE: Notice that the books ranking 003, 004, and 005 are not listed. That's because I specified that the oldest date to include as 2017 and those books were older then that. Another point to note, is that books ranking 006 and 007 both have the same rating, book titles, and release dates but they were sorted by the author's last name!
This is an advanced bite, so don't despair! Keep at it and you will emerge victorious! I look forward to seeing your submissions in the forum!