Beautiful Soup library is used for extracting my response data. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner, basically a Python library for pulling data out of HTML and XML files.
I have done scraping on this website: "https://www.onthisday.com"
" On This Day " is the world's largest, most accurate and popular site for on this day in history, it gives all the historical events happened in a day wise frame.
I have scrap the whole bunch of data of all days, filtered it in month wise frame and stored it in json file.
Using Fast API, have assigned endpoints for displaying historical events of today's date, month wise events, a particular day and month event and more...
It's a basic demo, just for understanding purpose.
You can use this code to scrap any website data which doesn't requires login.
pip install beautifulsoup4
If anyone got any module error, then install that module like
pip install module_name
python collect_events.py
uvicorn main:app --reload
Then go to the respective url( Ex: http://127.0.0.1:8000/ ), for a better view just add "docs" or "redoc" to your url. ( Ex: http://127.0.0.1:8000/docs or http://127.0.0.1:8000/redoc ) and explore it.
1). https://beautiful-soup-4.readthedocs.io/
2). https://www.onthisday.com/
3). https://fastapi.tiangolo.com/