Welcome to this unique GitHub repository, a treasure trove of real-world web scraping scenarios, complete with source code and Jupyter notebooks. This resource is designed to guide you through the various web scraping techniques applicable to a diverse range of use cases.
Currently, the repository features code and Jupyter notebooks for scraping job listings from two renowned job search websites, Indeed and LinkedIn. Each of these projects leverages popular Python libraries, namely BeautifulSoup and Selenium, and gives you an essential understanding of HTML structure - a key ingredient for any web scraping endeavor.
The beauty of these examples lies in their simplicity and modifiability. Each code is paired with a Jupyter notebook that simplifies the process into easily digestible portions, making it incredibly straightforward for you to follow and learn.
Here's a snapshot of what you can discover in this repository:
-
Indeed Job Scraper: Explore the Python script and corresponding Jupyter notebook explaining how to scrape job postings from Indeed.
-
LinkedIn Job Scraper: Discover the Python script and its Jupyter notebook, demonstrating the method to scrape job postings from LinkedIn.
-
Web Scraping Basics: Delve into Jupyter notebooks covering fundamental concepts and techniques such as interpreting HTML structure and using BeautifulSoup and Selenium for web scraping.
What's more, every piece of code in this repository comes with a complementary Medium article, offering a more detailed explanation and context. Simply follow the provided links in each of the previous sections.
This repository is a growing entity, with plans for continual expansion. New scripts and notebooks will be added to include more websites and use cases, ultimately creating a comprehensive resource for anyone keen on mastering web scraping.
Please be mindful and respect the terms of service of the websites you scrape. Enjoy your journey into the world of web scraping and happy coding!