Codecademy Data Science Path Project: Learn Web Scraping with BeautifulSoup
After eating chocolate bars your whole life, you’ve decided to go on a quest to find the greatest chocolate bar in the world.
You’ve found a website (Cacao Ratings) that has over 1700 reviews of chocolate bars from all around the world. It’s displayed in the web browser on this page.
The data is displayed in a table, instead of in a csv
or json
.
Thankfully, we have the power of BeautifulSoup that will help us transform this webpage into a DataFrame that we can manipulate and analyze.
The rating scale is from 1-5, as described in this review guide.
A 1
is “unpleasant” chocolate, while a 5
is a bar that transcends “beyond the ordinary limits”.
Some questions we thought about when we found this dataset were: Where are the best cocoa beans grown? Which countries produce the highest-rated bars? What’s the relationship between cocoa solids percentage and rating?
Can we find a way to answer these questions, or uncover more questions, using BeautifulSoup and Pandas?