Booknode scraper and analyser (my playground to try things).
git clone https://github.com/MiguelLaura/shelf_control.git
make deps
To scrape the top_1000:
python -m shelf_control.scraping
To use the dashboard:
python -m shelf_control.dashboard
And open localhost:8050 in a web browser.
Make changes in README.template.md and, to generate the updated README.md:
make readme
To check if there are any unused imports:
make lint
To format the code using black
:
make format
Once the changes are done, to lint, format and generate the README.md all at once:
make
- Scraping data from Booknode
- Top 1000 โ
- Time: 17min51s
- Memory: 1,8M
- Specific book (โ)
- Editor
- Person
- Author
- Top 1000 โ
- Analysing the data
- Build a dashboard (work in progress)
- Use machine learning to determine the themes of each resume (unsupervised and supervised learning possible)
- Build a recommandation system
- Output a graph of the books using the themes
- Use the progress bar from
minet
- Build an interface to use the scraping commands
- Add tests when useful
- Check how to properly stop dash
- Dataframe transformations
- Into functions (more generally, simplify code by making smaller steps)
- Add corresponding test + CI
Function to scrape the info for a specific book on Booknode.
Arguments
- url_book str - the url of the book on Booknode.
Returns
dict - book data
Generator yielding the info for each book in the top 1000 most liked books on Booknode.
Arguments
- page_nb int, optional - page number to start from.
Yields
dict - books data