mcboarder289 / shelf-help Goto Github PK

View Code? Open in Web Editor NEW

4.0 1.0 0.0 3.58 MB

License: MIT License

Python 38.71% HTML 0.39% JavaScript 4.24% TypeScript 55.93% CSS 0.74%

shelf-help's Introduction

Shelf Help

Do you have a really large to-read shelf on Goodreads, and need some help deciding what to read next?

Do you also love the library, whether it be physical books or using Libby?

If you answered yes to either of those questions, then Shelf Help a great app for you!

Background

The idea for this came about when I was browsing my Goodreads app while in the library trying to pick out what to read next. I was frustrated that I couldn't randomly sort in the app because things I recently put on there were at the top by default, but even then, I didn't have a way to know if it was available at my library.

I decided to put together a quick prototype project, originally in Plotly's Dash framework. In my work, I had built Dash apps before, but I recently discovered how you can host these webapps on Render, and it made me want to make something I could share more broadly.

So I figured I'd take a dive into React and make a bit more modern app, with a better look and feel, and here we are!

Contributing

If there is anything you'd like to see done here, throw up an issue here on Github!

Or if you want to help contribute directly, put up an issue and a branch.

BEFORE YOU PUT UP A PR please make sure you put one of these 4 terms in the title so that Render doesn't build the preview environment yet:

[skip render]
[skip preview]
[render skip]
[preview skip]

I am currently hosting everything right now, and currently on the free plan. If it gets more usage, then I'll investigate spending more keep it hosting.

If you feel so inclined and would like to support monetarily, feel free to use the "Buy me a coffee" link above!

Requirements:

Back End

Python 3.12.3
See /backend/requirements.txt

Front End

Yarn
Vite
See /frontend

shelf-help's People

Contributors

Stargazers

Watchers

shelf-help's Issues

PWA can't save bookmarked URL with query params

PR #21 introduce query parameters so that a user can bookmark their shelf entry. Unfortunately, the PWA manifest start_url is static and has the same entry point.

Doing some digging, and it might be possible to dynamically create the manifest.json. Here are some research articles that need to be dug into further:

Implement Redis Cache in Render

If I want to prepare for more load and horizontal scaling, I think it's best that I have a redis cache.

This way, if I horizontally scale, the memoization of the shelf data will work across all workers.

This decision to potentially scale in render is if we decide to host on AWS ec2.

Refactor/Clean Up Library Parsers

When implementing some of the newest library parsers, I picked up on a few patterns that could be abstracted out.

For example: Phoenix was basically a copy last with Syracuse, because those likely have very similar backend systems.

Would like to be more DRY and make some of those classes more modular.

Upgrade to latest Python (3.12)

Shouldn't be a huge deal based on a cursory look at the backend requirements. This would enable us to use the best AWS Lambdas for potential analytics in: #29

Good practice too since I believe 3.8 will end security updates.

Need to double check if Render supports 3.12.

Cleanup Library Parsers (physical books)

When implementing some of the newest library parsers, I picked up on a few patterns that could be abstracted out.

For example: Phoenix was basically a copy last with Syracuse, because those likely have very similar backend systems.

Would like to be more DRY and make some of those classes more modular.

Improve Logic on free-text searches

Examples:
This line for the gone away world haraway, nick returns appropriately in Nashville, but doesn't for the handmaiden's tale atwood, margaret because there are more commentaries about the second book. If we prepend them with quotes, then the second book is correct, however, the first book doesn't return because the quotes make it too specific of a search.

Hard to tell if removing characters like hyphens is helpful or not, but that wouldn't be a 100% fix likely. Worth thinking through how robust this could be. We could make multiple ISBN calls, but that's more network traffic.

Currently ok with the current "works most of the time solution", but open to better ones.

shelf-help/source/parsers/library.py

Line 64 in 8ea3834

 return f"https://catalog.library.nashville.org/Union/Search?view=list&showCovers=on&lookfor={urllib.parse.quote_plus(title)}+{urllib.parse.quote_plus(author)}&searchIndex=Keyword" 

Refactor to support other library checks

Would like to refactor the code to make it easier to drop in support for any new libraries for the availability check.

Making some kind of common interface, and storing various implementations in different modules would be one way others could contribute.

Having these modules might also include the ability to turn this into a dropdown, so we could support the selection of any arbitrary library within the application.

Refactor to be React App w/ Flask Backend

Now that the app is functional as a prototype with Plotly's Dash platform, I'd like to try and refactor this to have a distinct FE and BE so that the visual experience is cleaner / more modern. With a React FE, there would be more flexibility for the interactivity, which also might solve some of the PWA issues (not leveraging the url params).

This will be a significant refactor which will take more time, but I'd like to grow this full-stack skillset. I think it will be worth the investment. This resource seems useful as a good place to start.

Overall Vision

Monorepo structure
- BE (Python - Flask)
- FE (React)
Still deploying via Render but this way:
- Static Site for the FE
- Python Web Service for the BE
- Likely still just the raw code vs. containerizing/docker, but we'll see
- Render Suggestion 1 Render Suggestion 2

FE specifics

Would like to use Mantine - will be good to have a simple but useful component library.
Will shift to more Client-Side rendering vs. Server-side rendering, which will impact some of the current BE API calls.

BE Specifics

Will stick with Python w/ Flask.
- That way I can reuse a lot of what I know (ex: Flask Caching, etc.)
- Also able to more easily repurpose the already working core components (Libby search, Library Searches, Goodreads parser, etc.)
With more client-side rendering, the BE API needs to not return components/children, but JSON data.

Refactor Concerns

Decoupling FE and BE might cause more Bandwidth usage on Render (need to monitor this more carefully)
- Use something like https://tools.pingdom.com/ to check page size and performance.
- Currently, page size is 3.2MB and load time was 555ms w/ 63 requests.

Persist Inputs for return users

It would be nice to be able to use persistence for users who return back often. I see a few ways we could do this:

Short Term - Try to get persistence working within the Input component
- Issue here is that we populate that input in a callback (see dash discussion)
- Maybe we can get by using local storage with a dcc.Store?
Longer Term - Maybe some kind of saved profile/preferences that can be accessed. Would require a database or some decoupled instance (perhaps a free postgres tier on render. To keep it free, would need to maybe backup/reimplment manually every 30 days...). or some other kind of offline storage that is free/cheap?

Add Analytics to measure usage

Preface

I'd like to be able to measure usage of the app, but after doing research, also not use any third party analytics because I want to respect privacy as much as possible.

Need to add in the help docs what we're tracking so we're transparent. For the app to be useful at all, your Goodreads data would need to be public anyway, so we are not tracking anything that isn't publicly available. Still want to be transparent though about what is stored.

The solution needs to be:

Cheap and scalable (unless it starts to get used a ton, I'd like to keep it free, but also be able to scale with more usage)
Privacy-first (care less about an individual, more about overall usage - shelf data may be helpful for ML recommendations as a future-state feature)

Implementation Ideas

Supabase Postgres DB
- Not finding any other great "free tier" databases
- This would need to be kept alive through usage once a week
- With postgres, easy to transfer to something I roll on my own if needed.
Single "Event" table which simply writes a record when "Get Data" form submission is sent. It should contain:
- Shelf URL
  - Proxy for users. Nothing preventing a user from picking form other public shelves, but a good anchor value
  - Could be used later to see what books are being searched (obviously a point in time as the shelves can change)
- Timestamp of the event
  - Could also maybe store the start and end time to get an idea for performance
- Number of books returned
  - Could be correlated to parsing time
- Stretch Maybe store the book data too? As well as what was randomly selected?
  - Might be too much data
Stretch Track Library Searches
- What libraries are being searched most?
- Are people searching Libby or Physical books more often?
Aggregate tables to be exposed to a new page on the app (30-60-90 day trends?)
- Could be a stored procedure? Or simply a view?
- If a stored proc, then maybe it saves daily counts in an aggregate table
- Exposed view shows some high level charts that simply pull from the database

Implement fuzzy match on title for free text searches

I'd like to impelment something like rapidfuzz for checking the results when we do a free-text search (vs. ISBN).

Parsers like Miami and others that rely on this could see better status representations especially.

Helpful Article
rapidfuzz github

Implement Testing Framework

Now that this initial version of the application is more fleshed out, I’d like to implement a reasonable testing structure.

It’s difficult to determine what granularity to test, since the main thing would be testing the parsers. Right now I manually do this, but would like to implement something more automated.

Make this a PWA (Progressive Web App) for Mobile

It would be cool to make this a PWA and deploy it on phones for more app-like experience. Not sure how it would work with the navigation to other pages, but it's worth trying out.

Here is a helpful video and repo for doing this with Dash Apps:

Improve PWA Updates

Currently, we need to open and close the PWA when a new update is pushed/released.

Reading this stack overflow post gives me ideas on how to upgrade it.

Support ebook format

The library checks are built to support the physical book search only (my own personal preference) by default.

Others have expressed the desire to find eBook availability. I can see this being useful, so here are some preliminary thoughts on what we need to consider:

Updating all library parsers to search for ebooks
- contract returns some availability string and url
- Should we make the return string more structured? (Status codes so we can style the response, string that is a more detailed message, etc.) Would be a good time to refactor this.
Add some kind of input to denote which format a user wants to search
Are these the only two formats we want to support?

Create Better Visual Experience with Text Inputs

This plotly post has a potential css + clientside callback solution.

The DMC component library also seems to have one that works. Might also be interesting to try out style wise vs. DBC?

Support Goodreads Shelves/Lists

Currently, the following URL patterns work(substituting with <>):

https://www.goodreads.com/review/list/<numbers go here>?shelf=to-read
https://www.goodreads.com/review/list/<numbers here>-<user-name>?shelf=to-read
https://www.goodreads.com/user_shelves/<numbers here>

But lists/groups don't work:

Would be nice to add support for these other patterns as well.

Implement Faster GoodReads Parsing

I didn't realize that good reads shelves had an RSS feed. Discovered this when reading more about how OverReader does their searches with eBooks.

If we can get the RSS link from the shelf page, then we can user something like feedparser to make this much more efficient than my current goodreads parser.

Limitation on this right now is that you only get 100 records from the RSS feed... so that won't work for big shelves (whereas my solution currently does allow for at least randomly searching more than that)

Add Progress Bar instead of Loading component

I would like to implement a progress bar when retrieving the book data from the multi-page shelfs. A simple percentage of 1 / num_pages would suffice, but it would require some refactoring of the cache, including implementing some kind of a background queue (diskcache for now, but could be Celery with Redis to be more productionized).

Helpful stackoverflow on implementing this:
https://stackoverflow.com/a/77109011

Productionize Gunicorn config

In order to handle load properly, need to properly update the gunicorn command to something like:

gunicorn --workers=3 app:app -t 60 --keep-alive 60

Workers by default should be: (2*CPU)+1 , so 3 for a single cpu.

Helpful Medium Post
Helpful Stack Overflow #1
Helpful Stack Overflow #2

Assumptions

Still using Render - can probably get away with a single core machine for starters, then manually scale horizontally?
If using AWS EC2, then we can potentially get more CPU + Memory for less $$$, but the concept is still the same.