Giter Club home page Giter Club logo

Comments (7)

andre-st avatar andre-st commented on May 29, 2024

Good idea, unfortunately 300 reviews per book is a Goodreads limitation. There are stars-filters, text-search and sort order parameters, perhaps I could squeeze out more reviews this way if they show reviews which are not included in the default 300. Text-search results are limited to 30 hits. Sort order ("newest"/"oldest") often shows less than 300 reviews.

Throttling already happens from the very first start (~1sec per request), except some AJAX URLs.
I should make this more clear in the docs.

I'm going to test this, thank you.

from goodreads-toolbox.

hwasiti avatar hwasiti commented on May 29, 2024

unfortunately 300 reviews per book is a Goodreads limitation.

Hmmm.... That's why you chose this number. My next question was, can you add a way for the user to override this number by the command line? and what happens really, if I modify the code to let it be 600 and not 300.

Now it is clear.

Yes please add the above info in the Readme. It will makes it clear who we should blame on the 300 limit :)

from goodreads-toolbox.

andre-st avatar andre-st commented on May 29, 2024

300 is the maximum. I read all available reviews pages, and after 300 reviews, there is no more. It's not a constant or an argument passed from my code, it's just what you can get from Goodreads.
Probably, I can get other additional reviews by varying Goodreads query-parameters 'sort order' and 'search text' and 'rating', so it's 300 + x.

from goodreads-toolbox.

andre-st avatar andre-st commented on May 29, 2024

Seems that I can extract way more reviews by changing these parameters (but each always < 300).
By merging the results I have more reviews in the end. Search time for individual books, however, takes even longer.
300 default (most popular?) + 300 newest + 300 oldest + results from the text-search using popular trigrams get me more than 4000 reviews for some books, at the moment (500 of 2443 trigrams)...
I could randomize on the part of the trigrams to reduce the overall time.

from goodreads-toolbox.

hwasiti avatar hwasiti commented on May 29, 2024

Wow.. That's terrific!

I could randomize on the part of the trigrams to reduce the overall time.

If you want to impose some limit to the total number to reduce the overall time, can you please add an option for the user that we can add in the command line to elevate that limit?

For me, I am ready to wait even 1 week or even 1 month to get the most accurate findings. I don't care for time, because I will do it not more than once every year I guess.

from goodreads-toolbox.

andre-st avatar andre-st commented on May 29, 2024

I don't care for time

Good to know, I'll add an option

from goodreads-toolbox.

andre-st avatar andre-st commented on May 29, 2024

With regards to the original question: Unfortunately, sufficient randomization isn't possible. The fixed sample is the only we get from Goodreads. Like I wrote, I found a way to increase the size of this sample: The program now mixes all reviews-filter criteria und merges the results.
My mistake was to assume that these filters just load a subset of what you see when no filters are applied. But given enough ratings and reviews, every filter finds reviews, ratings etc not included in any other result. Theoretical limit is 5400 reviews: 6*3 filter combinations * max. 300 displayed reviews (Goodreads limit).

If there are many ratings, the program also runs a dictionary with trigrams against the text-based reviews-search provided by Goodreads (shows max. 30 reviews per query, 2-4 seconds each).
Dict-search is still experimental: In tests it found 0 to ~7000 reviews for (popular) books. But there are many more reviews not found. Perhaps trigram combinations would find even more unique reviews.

There are program options to disable or adjust the dict-search (--help). The dictionary is hardcoded into the module at the moment. Could be optimized too.

from goodreads-toolbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.