If I run the search again, will it give me the same results? I think

Can you make the search random for each run? about goodreads-toolbox HOT 7 CLOSED

andre-st commented on May 29, 2024

Can you make the search random for each run?

from goodreads-toolbox.

Comments (7)

andre-st commented on May 29, 2024

Good idea, unfortunately 300 reviews per book is a Goodreads limitation. There are stars-filters, text-search and sort order parameters, perhaps I could squeeze out more reviews this way if they show reviews which are not included in the default 300. Text-search results are limited to 30 hits. Sort order ("newest"/"oldest") often shows less than 300 reviews.

Throttling already happens from the very first start (~1sec per request), except some AJAX URLs.
I should make this more clear in the docs.

I'm going to test this, thank you.

from goodreads-toolbox.

hwasiti commented on May 29, 2024

unfortunately 300 reviews per book is a Goodreads limitation.

Hmmm.... That's why you chose this number. My next question was, can you add a way for the user to override this number by the command line? and what happens really, if I modify the code to let it be 600 and not 300.

Now it is clear.

Yes please add the above info in the Readme. It will makes it clear who we should blame on the 300 limit :)

from goodreads-toolbox.

andre-st commented on May 29, 2024

300 is the maximum. I read all available reviews pages, and after 300 reviews, there is no more. It's not a constant or an argument passed from my code, it's just what you can get from Goodreads.
Probably, I can get other additional reviews by varying Goodreads query-parameters 'sort order' and 'search text' and 'rating', so it's 300 + x.

from goodreads-toolbox.

andre-st commented on May 29, 2024

Seems that I can extract way more reviews by changing these parameters (but each always < 300).
By merging the results I have more reviews in the end. Search time for individual books, however, takes even longer.
300 default (most popular?) + 300 newest + 300 oldest + results from the text-search using popular trigrams get me more than 4000 reviews for some books, at the moment (500 of 2443 trigrams)...
I could randomize on the part of the trigrams to reduce the overall time.

from goodreads-toolbox.

hwasiti commented on May 29, 2024

Wow.. That's terrific!

I could randomize on the part of the trigrams to reduce the overall time.

If you want to impose some limit to the total number to reduce the overall time, can you please add an option for the user that we can add in the command line to elevate that limit?

For me, I am ready to wait even 1 week or even 1 month to get the most accurate findings. I don't care for time, because I will do it not more than once every year I guess.

from goodreads-toolbox.

andre-st commented on May 29, 2024

I don't care for time

Good to know, I'll add an option

from goodreads-toolbox.

andre-st commented on May 29, 2024

With regards to the original question: Unfortunately, sufficient randomization isn't possible. The fixed sample is the only we get from Goodreads. Like I wrote, I found a way to increase the size of this sample: The program now mixes all reviews-filter criteria und merges the results.
My mistake was to assume that these filters just load a subset of what you see when no filters are applied. But given enough ratings and reviews, every filter finds reviews, ratings etc not included in any other result. Theoretical limit is 5400 reviews: 6*3 filter combinations * max. 300 displayed reviews (Goodreads limit).

If there are many ratings, the program also runs a dictionary with trigrams against the text-based reviews-search provided by Goodreads (shows max. 30 reviews per query, 2-4 seconds each).
Dict-search is still experimental: In tests it found 0 to ~7000 reviews for (popular) books. But there are many more reviews not found. Perhaps trigram combinations would find even more unique reviews.

There are program options to disable or adjust the dict-search (--help). The dictionary is hardcoded into the module at the moment. Could be optimized too.

from goodreads-toolbox.

Can you make the search random for each run? about goodreads-toolbox HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent