andre-st / goodreads-toolbox Goto Github PK

9 tools for Goodreads.com, for finding people based on the books they’ve read, finding books popular among the people you follow, following new book reviews, etc

License: GNU General Public License v3.0

Perl 78.47% Shell 1.30% Makefile 2.30% Dockerfile 0.87% HTML 17.05%

goodreads goodreads-api goodreads-shelves statistics rating recommender reviews monitoring notification similarity

goodreads-toolbox's People

Contributors

Stargazers

Watchers

Forkers

project-renard-survey watersibilantfalling todun dunnamore niamh34567 isunilk cloudawn

goodreads-toolbox's Issues

Q: Goodreads website redesign. Will this (goodreads-toolbox) still work, or "what's the future?"

See your nearest www.goodreads.com

This is a question: which of the tools here still work?

OK: it's being rolled out, I sometimes get the new version, sometimes the old. But it's here.

savreviews.pl: Filter by stars

Example: Only save the negative reviews

Alternative to filter: create and save to multiple files by default:

savreviews-book123-5stars.txt
savreviews-book123-4stars.txt
savreviews-book123-3stars.txt
savreviews-book123-2stars.txt
savreviews-book123-1stars.txt
savreviews-book123-0stars.txt   ??
savreviews-book123-all.txt

Can you make the search random for each run?

If I run the search again, will it give me the same results?

I think you have mentioned that the limitation of this code is to read maximum 300 members per each book. Actually, this is because the risk of throttling our bandwidth.

But, Can you make those 300 members for each book different in each run? So we can digg more to decrease the impact of this limitation? Perhaps we can run it every couple of days to prevent the risk of throttling by goodreads, or even from different ip ...etc.

Your work by the way is AWESOME!!!!
I felt very bad when Kindle has discontinued this feature, and for years the Goodreads community is begging for it with no response or hope.

Thanks Andre!

New program: Export books of other members as CSV

Export books of other members as CSV-file so you can use their data with any stats-generating software on GitHub or the Web which expects you to upload "your" CSV-export file.

The background of this feature request is this and the following posts:
https://www.goodreads.com/topic/show/19691062-select-lists-being-retired?page=4&ref=nav_bar_discussions_pane_discussion#comment_188846552

Unshelved books of favorite authors

A possible script could output all books for your favorite authors (those rated 4 stars or more) that you have not added to your library. This could be refined to only output recent books (those that have been recently published, or are going to be published in the future). A publication date column would be especially useful so you easily see when you can read the book.

recentrated: Don't link to private profiles

User got an email with one [TTTT ] link to a private strange profile (spam bot?):
User neither saw the stupid review text, nor can do anything with the linked private profle page.

Better:
check reviewer status (extra request?) and if private, indicate [p]rivate account and:

link to the book page
or show the review text in the email
or drop this review

"The Flesh Cartel 12: Paradise Island"
www.goodreads.com/user/show/56528975 [TTTT ]

Won't read shelves with dashes

I can't get the script to accept shelves with dashes in the name. For example, ./likeminded.pl --shelf=read-hugo ######### returns "Loading authors from "read-hugo"...[FATAL] No books found." The script works as expected f I rename the shelf to get rid of the dash(es).

recentrated: check if source profile is private

Cannot load shelf if registered user is not public (to search engines) but visible to Goodreads users only.
Could set_good_cookie_file() but I like this program to run without my cookie.
Should add a check in extract_books() which looks for:
"This Profile Is Restricted to Goodreads Users. [...] Sign in to Goodreads to Learn More About $USER."
Perhaps, send an warning email to user?

Create dockerfile

Could you provide a working dockerfile / docker-compose?

likeminded.pl: also take into how similar other users rate books

Hi! I'm super interested in this feature. Would you like to discuss it and implement it together?

Cheers!

likeminded.pl: Include similar authors

At the moment, we compare authors instead of books because combinations of same books are more rare than combinations of same authors - while latter still satisfies 'same taste' condition. The main assumption with books is that likeminded people have the same exposure to the same books. But that's questionable. Comparing the authors widens this, and perhaps extending with similar authors widens this even more.

Getting the GR cookie is not user-friendly

At the moment, you have to go strange ways in order to get the Goodreads cookie.
It would be easier to pass the GR password to a program and let the program proceed a login in the background and thereby get the cookie.

I'm currently having an issue using your scripts that require the .cookie file:
they act as though I've not created such a file (including the cookie text from Goodreads)
even when I have. I'm using Windows though so that may be the issue.
I don't know if it is because I saved it in an appropriate encoding format
(i.e. ANSI / Unicode / UTF-8 etc.), or if it is in the wrong folder?
(I tried the "goodreads" project folder and all the subfolders).

With [Ubuntu for] Windows, it won't normally allow you to create a file without a name,
but I followed the advice that to get round that, you'd for example name the file .cookie. ,
and then Windows automatically turns it into .cookie

savreviews.pl: Reviewer demographics

Age, gender, location, number of read books, friends, ...
including user's reading stats page might be faster (1 request) than loading the user's shelves (n requests): number pages and books per year
https://www.goodreads.com/review/stats/${USERID}
program option
separate report? CSV, HTML?

Undefined subroutine &WWW::Curl::Easy::CURLOPT_TCP_KEEPALIVE

Hi, I'm trying to make the search.pl script work but I get the following:

me@www ~/tmp/goodreads $ ./search.pl --order=stars,num_ratings --ratings=10000 linux
Reporting Bugs:
Report bugs to [email protected] or use Github's issue tracker
https://github.com/andre-st/goodreads/issues

Searching books:

about..... linux
rated by.. 10000 members or more
order by.. stars, num_ratings, year
progress.. Undefined subroutine &WWW::Curl::Easy::CURLOPT_TCP_KEEPALIVE called at /home/akenny/tmp/goodreads/lib/Goodscrapes.pm line 2094.

Any idea?

likeminded.pl: rank similar members with respect to the size of their libraries

At the moment, 'similarity' is the number of common authors regardless of the total number of books in the similar member's library.
Now, if member John has 5000 books and Carol 500, but both members have 10 authors common with me, Carol should be considered more equal to me and ranked higher than John.

Add a troubleshooting / FAQ section somewhere

Maybe in INSTALL.txt
If possible, better add or leave troubleshooting info to the error messages; don't duplicate
see #25
notes on performance and throttling

New program: Members popular among your friends

via Goodreads direct message:

Sometimes when I go to a member page, it will say "FRIENDS IN COMMON (5)": my suggestion was a script that could make a list of the friends of my friends, ordered by the number of friends who are friends with each member.

On this goodreads page (for me):
https://www.goodreads.com/friend/of_friends

it will tell me that a member is a friend of one or two people I am a friend of, but often, they may actually be a friend of 12 people I know: also, I'm sure the list on that page is much smaller than it actually should be.

I wish I could explain it more clearly, but an output could look a bit like this:

(Member #): 1111
Friend of 5 members I am friends with: #1112, #1113, #1114, #1115, #1116

(Member #): 1117
Friend of 4 members I am friends with: #1118, #1119, #1120, #1121

(Member #): 1122
Friend of 3 members I am friends with: #1123, #1124, #1125

I thought it would make sense that people with a similar taste would be a good source for finding more people with the same taste.

Add tool for find people read same books

Please add a tool to find people that read or want-to-read similar books that I've read or want to read. Not based on the books' author, but the books themself.

friendrated: Don't list books that I've already read

via E-Mail:

the output file would be improved if it was possible to exclude the books that I had read (so it would show the most popular books that I had not read).

New program: Extended GR backups

Goodreads export doesn't save

reading progress updates
...

See:

https://www.goodreads.com/topic/show/19671542-back-ups
"Date-Read" https://www.goodreads.com/topic/show/19374476
"Date-Read" https://www.goodreads.com/topic/show/19703141
"ASIN" https://www.goodreads.com/topic/show/2138564
"Shelves" https://www.goodreads.com/topic/show/19715918

Maybe:
./backup.pl --input=last-export-file.csv > extended-export-file.csv

[Feature request] More filters for friendrated

It would be nice to filter books output with flags of publishing year and no. of reads:
like 'books with less than 1000 ratings(/read) and (First)published between 1950 to 1980 and rated 4 or 5 stars by my friends'.

If someone can fix the login bug, post it here

The issues page is open again. If someone has the understanding to fix the login bug, please post it here, so we can copy in the changes. The original developer seems to no longer be active. The problem is in lib/Goodscrapes.pm, involving line 258. It somehow needs to be adapted to the new Goodreads design.
I'm looking forward to this working again.

Find good GR bookclubs

based on friends/followees list or reviewers etc = groups actually used
GR search function is pretty useless

Code accepts mistyped named shelves

When copied the name of the shelf from GR and pasted in Bash, for some reason Bash will truncate the last letter and the code get the name wrong. However, this never happens when I write the shelf name manualy. So I suspect this is a problem with Ubuntu on Windows, but I don't remember this happened before.

Anyway, in the case of writing the wrong shelf name, the code will load ALL books. I think better, to respond with an error "Book shelf is not available", to prevent the misleading of loading the results of ALL books and the user thinks it is for a certain bookshelf.

By the way, the code is terrific. For all my books (~440 books), it took around 3-4 days, and without a single network problem. Resulted in 1018 GR users with at least 5% of the authors in 70429298's shelf "%23ALL%23".

The cache helped a lot, when making other single shelf search or meta shelves (combination of 2 or more ) shelf search.

Upload the docker container to dockerhub

Would be faster than getting people to rebuild each time they download!

Thanks for building this

Handle Goodreads Maintenance Mode

Abort script(s)

GR login via library currently broken

Goodreads seems to have changed their login process (probably different URLs, OpenID, etc. at first glance). If I find time, I'll take care of it. Unfortunately, I won't be able to do that in the short term. As a result, some programs are currently not working. When using only the library, many operations also work without a login, only then more slowly.

likeminded.pl vs private accounts: Load books from CSV-file

I see that to use likeminded, I need to make my shelf public, could you also provide the option of using exported shelf (.csv) directly for book ID instead of user ID.
(contact via Goodreads message)

recentrated: Distribute shelf-checks over n days, if > 100 books

Some users register shelves with 500 books, and checking all in one run takes longer than I want for each single user. Since I run this cronjob daily, I should stop at 100 per user and check the next 100 the next day and so on...

Don't cache page if prompting sign-in

Goodscrapes caches the sign-in page with url X if Goodreads is showing the sign-in screen instead of page X (because of an invalid cookie). You will not get any data until the file cache has been deleted. It's a show-stopper if this page is the user shelf or friends list (0 users).

Have to check if this is a HTTP 302 or just a message in page X.

Books by a certain author that your friends have read

List all the books by a certain author that (friends of) your friends have read (Simon)

Error: IO::Socket::SSL 1.42 and Net::SSLeay 1.49 must be installed for https support

Suddenly got this error message:

[CRIT ] ... IO::Socket::SSL 1.42 must be installed for https support 
Net::SSLeay 1.49 must be installed for https support

friendrated: Output most signifcant instead of most faved books

via E-Mail:

I recently had the thought that the friendrated.pl script could be used to find books that are especially significant (not necessarily the ones with the most "Faves") amongst my friends by calculating
1 / ("Num GR Ratings" / "Faved)
so that a higher number would be better. I tried that in Excel and the results were excellent.

New program: "Who unfollowed me?"

Similar to Twitter's who.unfollowed.me - perhaps as service too.
incl. "who follows back"?

Can we make search on more than one bookshelf in a single run?

The current update of likeminded.pl is just amazing. I have re-run the search and getting with some books ~4000-5000 members. Your code is getting close to the ideal search that GR would have done if they would apply this feature using their database.

Also I feel the current update, solved an issue where every few hours the code was stopped saying there was no response from Goodreads or something like that. Now the current run for the last 24 hours never stopped. Maybe this is related to my pc or internet problem.

The current run will take ~7 days and not finished yet ( ALL books search for ~350 books), so I could not experiment whether there is an option to let the --shelf parameter include more than one shelf. Is that possible?

It seems another several search run with certain shelves combined is interesting too. With all books and a lot of them are very popular, the results are generic as you said.

In case you don't like the idea, the workaround is easy. I will create meta-shelves for major topics, and include the books of the other shelves related into those major topic shelves. But when I want to add a new book then I have to add the book manually into the meta-shelf beside adding it to its correct shelf. I wished if there was a way GR can add automatically all the assigned children shelves into a meta-shelf. If your code works with more than one shelf, then this meta-shelf is not necessary.

friendrated.pl returns only books I have already read, gets ratings wrong

I just ran friendrated.pl with no arguments, and it returned a list of books in common with my friends/followers only -- specifically, books on my read shelf. In addition, it showed me some books as faved by my friends that they did not rate 4 or 5 stars.

Running with -x read in the arguments reduces the number of results by about 3/4 but still shows me books from my read shelf only (the -x <shelfname> argument is meant to exclude books from the specified shelf).

Running for one of my friends with the -u <their id> argument returns only books that she, I, and her followers have on our read shelves.

My script for finding books by looking at bookshelves of people who read similar books

Love this toolbox. But it was missing a feature for finding books by looking at bookshelves of people who read similar books. So I wrote this small perl script for that today.

Here is how it works:

fetches books with 4 and 5 stars in your profile
crawls reviews of these books to find users who also rated it 4 or 5 stars
looks up the bookshelves of those users to see which books they rated 4 or 5 stars
ranks books based on number of votes from these users
also ranks users by number of books they have in common (min 3)
also gives more votes to users who love the same books as you but also hate the same books as you (i.e. 1 or 2 star)

Output:

Gives you a list of books who were rated highly by people who share the same taste as you
Gives you a list of doppelgangers, i.e. people who have rated books very similar to you.

My perl is a little rusty so this isn't the best way to do it but then perl motto is TIMTOWTDI and it did produce some good outputs.

Let me know what you guys think. Will post the script in the next comment.

friendrated: Most hated books among friends and followees

At the moment, the report includes books rated either 4 or more stars - most liked books.
It could be interesting to see the most hated books too - why?
Having a parameter so that the report only includes books rated 2 or less stars

make has failed for libcurl

Tried it on Ubuntu 14 and got this error:

perl -MCPAN -e 'install Cache::FileCache, WWW::Curl::Easy, Text::CSV, Log::Any, XML::Writer' || (echo "Please send your error messages to [email protected]" && false)
Reading '/home/haider/.cpan/Metadata'
  Database was generated on Fri, 22 Jun 2018 05:54:29 GMT
Cache::FileCache is up to date (undef).
Running install for module 'WWW::Curl::Easy'
Running make for S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz
Checksum for /home/haider/.cpan/sources/authors/id/S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz ok

  CPAN.pm: Building S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz

Locating required external dependency bin:curl-config... missing.
Unresolvable missing external dependency.
Please install 'curl-config' seperately and try again.
NA: Unable to build distribution on this platform.
No 'Makefile' created'YAML' not installed, will not store persistent state
  SZBALINT/WWW-Curl-4.17.tar.gz
  /usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK
Running make test
  Make had some problems, won't test
Running make install
  Make had some problems, won't install
Text::CSV is up to date (1.95).
Log::Any is up to date (1.705).
XML::Writer is up to date (0.625).
Could not read metadata file. Falling back to other methods to determine prerequisites
chmod +x *.pl

a list of ALL dependencies is needed.

Ok, I used to be a Perl programmer, so I can slowly look at every script, one by one, and install each missing library. However, some people have never been Perl programmers.

A list of all dependencies is needed, and either an apt(itude) or cpan/cpanm or some such command line to install them all that a user can copy an paste into their terminal.

After running that command, each of these excellent - they are excellent - scripts should then run without complaint.

e.g. I tried cpanm, then aptitude, and finally google to find the Goodscrapes library. Ah, it's in the /lib/ subdir. Good, but a pointer would have been helpful.