Comments (7)
Good idea, unfortunately 300 reviews per book is a Goodreads limitation. There are stars-filters, text-search and sort order parameters, perhaps I could squeeze out more reviews this way if they show reviews which are not included in the default 300. Text-search results are limited to 30 hits. Sort order ("newest"/"oldest") often shows less than 300 reviews.
Throttling already happens from the very first start (~1sec per request), except some AJAX URLs.
I should make this more clear in the docs.
I'm going to test this, thank you.
from goodreads-toolbox.
unfortunately 300 reviews per book is a Goodreads limitation.
Hmmm.... That's why you chose this number. My next question was, can you add a way for the user to override this number by the command line? and what happens really, if I modify the code to let it be 600 and not 300.
Now it is clear.
Yes please add the above info in the Readme. It will makes it clear who we should blame on the 300 limit :)
from goodreads-toolbox.
300 is the maximum. I read all available reviews pages, and after 300 reviews, there is no more. It's not a constant or an argument passed from my code, it's just what you can get from Goodreads.
Probably, I can get other additional reviews by varying Goodreads query-parameters 'sort order' and 'search text' and 'rating', so it's 300 + x.
from goodreads-toolbox.
Seems that I can extract way more reviews by changing these parameters (but each always < 300).
By merging the results I have more reviews in the end. Search time for individual books, however, takes even longer.
300 default (most popular?) + 300 newest + 300 oldest + results from the text-search using popular trigrams get me more than 4000 reviews for some books, at the moment (500 of 2443 trigrams)...
I could randomize on the part of the trigrams to reduce the overall time.
from goodreads-toolbox.
Wow.. That's terrific!
I could randomize on the part of the trigrams to reduce the overall time.
If you want to impose some limit to the total number to reduce the overall time, can you please add an option for the user that we can add in the command line to elevate that limit?
For me, I am ready to wait even 1 week or even 1 month to get the most accurate findings. I don't care for time, because I will do it not more than once every year I guess.
from goodreads-toolbox.
I don't care for time
Good to know, I'll add an option
from goodreads-toolbox.
With regards to the original question: Unfortunately, sufficient randomization isn't possible. The fixed sample is the only we get from Goodreads. Like I wrote, I found a way to increase the size of this sample: The program now mixes all reviews-filter criteria und merges the results.
My mistake was to assume that these filters just load a subset of what you see when no filters are applied. But given enough ratings and reviews, every filter finds reviews, ratings etc not included in any other result. Theoretical limit is 5400 reviews: 6*3 filter combinations * max. 300 displayed reviews (Goodreads limit).
If there are many ratings, the program also runs a dictionary with trigrams against the text-based reviews-search provided by Goodreads (shows max. 30 reviews per query, 2-4 seconds each).
Dict-search is still experimental: In tests it found 0 to ~7000 reviews for (popular) books. But there are many more reviews not found. Perhaps trigram combinations would find even more unique reviews.
There are program options to disable or adjust the dict-search (--help). The dictionary is hardcoded into the module at the moment. Could be optimized too.
from goodreads-toolbox.
Related Issues (20)
- Won't read shelves with dashes HOT 2
- recentrated: Distribute shelf-checks over n days, if > 100 books
- friendrated: Most hated books among friends and followees
- Getting the GR cookie is not user-friendly HOT 1
- Add a troubleshooting / FAQ section somewhere
- New program: Members popular among your friends
- friendrated: Don't list books that I've already read
- friendrated: Output most signifcant instead of most faved books HOT 1
- Create dockerfile HOT 2
- likeminded.pl: also take into how similar other users rate books HOT 4
- Unshelved books of favorite authors HOT 2
- savreviews.pl: Reviewer demographics
- Upload the docker container to dockerhub HOT 1
- Add tool for find people read same books HOT 2
- My script for finding books by looking at bookshelves of people who read similar books HOT 9
- Q: Goodreads website redesign. Will this (goodreads-toolbox) still work, or "what's the future?" HOT 2
- GR login via library currently broken HOT 1
- If someone can fix the login bug, post it here HOT 3
- Error: IO::Socket::SSL 1.42 and Net::SSLeay 1.49 must be installed for https support HOT 1
- friendrated.pl returns only books I have already read, gets ratings wrong HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from goodreads-toolbox.