Giter Club home page Giter Club logo

andre-st / goodreads-toolbox Goto Github PK

View Code? Open in Web Editor NEW
78.0 7.0 7.0 2.13 MB

9 tools for Goodreads.com, for finding people based on the books they’ve read, finding books popular among the people you follow, following new book reviews, etc

License: GNU General Public License v3.0

Perl 78.47% Shell 1.30% Makefile 2.30% Dockerfile 0.87% HTML 17.05%
goodreads goodreads-api goodreads-shelves statistics rating recommender reviews monitoring notification similarity

goodreads-toolbox's People

Contributors

andre-st avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

goodreads-toolbox's Issues

savreviews.pl: Filter by stars

Example: Only save the negative reviews

Alternative to filter: create and save to multiple files by default:

savreviews-book123-5stars.txt
savreviews-book123-4stars.txt
savreviews-book123-3stars.txt
savreviews-book123-2stars.txt
savreviews-book123-1stars.txt
savreviews-book123-0stars.txt   ??
savreviews-book123-all.txt

Can you make the search random for each run?

If I run the search again, will it give me the same results?

I think you have mentioned that the limitation of this code is to read maximum 300 members per each book. Actually, this is because the risk of throttling our bandwidth.

But, Can you make those 300 members for each book different in each run? So we can digg more to decrease the impact of this limitation? Perhaps we can run it every couple of days to prevent the risk of throttling by goodreads, or even from different ip ...etc.

Your work by the way is AWESOME!!!!
I felt very bad when Kindle has discontinued this feature, and for years the Goodreads community is begging for it with no response or hope.

Thanks Andre!

Unshelved books of favorite authors

A possible script could output all books for your favorite authors (those rated 4 stars or more) that you have not added to your library. This could be refined to only output recent books (those that have been recently published, or are going to be published in the future). A publication date column would be especially useful so you easily see when you can read the book.

recentrated: Don't link to private profiles

User got an email with one [TTTT ] link to a private strange profile (spam bot?):
User neither saw the stupid review text, nor can do anything with the linked private profle page.

Better:
check reviewer status (extra request?) and if private, indicate [p]rivate account and:

  • link to the book page
  • or show the review text in the email
  • or drop this review

"The Flesh Cartel 12: Paradise Island"
www.goodreads.com/user/show/56528975 [TTTT ]

Won't read shelves with dashes

I can't get the script to accept shelves with dashes in the name. For example, ./likeminded.pl --shelf=read-hugo ######### returns "Loading authors from "read-hugo"...[FATAL] No books found." The script works as expected f I rename the shelf to get rid of the dash(es).

recentrated: check if source profile is private

Cannot load shelf if registered user is not public (to search engines) but visible to Goodreads users only.
Could set_good_cookie_file() but I like this program to run without my cookie.
Should add a check in extract_books() which looks for:
"This Profile Is Restricted to Goodreads Users. [...] Sign in to Goodreads to Learn More About $USER."
Perhaps, send an warning email to user?

likeminded.pl: Include similar authors

At the moment, we compare authors instead of books because combinations of same books are more rare than combinations of same authors - while latter still satisfies 'same taste' condition. The main assumption with books is that likeminded people have the same exposure to the same books. But that's questionable. Comparing the authors widens this, and perhaps extending with similar authors widens this even more.

Getting the GR cookie is not user-friendly

At the moment, you have to go strange ways in order to get the Goodreads cookie.
It would be easier to pass the GR password to a program and let the program proceed a login in the background and thereby get the cookie.

I'm currently having an issue using your scripts that require the .cookie file:
they act as though I've not created such a file (including the cookie text from Goodreads)
even when I have. I'm using Windows though so that may be the issue.
I don't know if it is because I saved it in an appropriate encoding format
(i.e. ANSI / Unicode / UTF-8 etc.), or if it is in the wrong folder?
(I tried the "goodreads" project folder and all the subfolders).

With [Ubuntu for] Windows, it won't normally allow you to create a file without a name,
but I followed the advice that to get round that, you'd for example name the file .cookie. ,
and then Windows automatically turns it into .cookie

savreviews.pl: Reviewer demographics

  • Age, gender, location, number of read books, friends, ...
  • including user's reading stats page might be faster (1 request) than loading the user's shelves (n requests): number pages and books per year
    https://www.goodreads.com/review/stats/${USERID}
  • program option
  • separate report? CSV, HTML?

Undefined subroutine &WWW::Curl::Easy::CURLOPT_TCP_KEEPALIVE

Hi, I'm trying to make the search.pl script work but I get the following:

me@www ~/tmp/goodreads $ ./search.pl --order=stars,num_ratings --ratings=10000 linux
Reporting Bugs:
Report bugs to [email protected] or use Github's issue tracker
https://github.com/andre-st/goodreads/issues

Searching books:

about..... linux
rated by.. 10000 members or more
order by.. stars, num_ratings, year
progress.. Undefined subroutine &WWW::Curl::Easy::CURLOPT_TCP_KEEPALIVE called at /home/akenny/tmp/goodreads/lib/Goodscrapes.pm line 2094.

Any idea?

New program: Members popular among your friends

via Goodreads direct message:

Sometimes when I go to a member page, it will say "FRIENDS IN COMMON (5)": my suggestion was a script that could make a list of the friends of my friends, ordered by the number of friends who are friends with each member.

On this goodreads page (for me):
https://www.goodreads.com/friend/of_friends

it will tell me that a member is a friend of one or two people I am a friend of, but often, they may actually be a friend of 12 people I know: also, I'm sure the list on that page is much smaller than it actually should be.

I wish I could explain it more clearly, but an output could look a bit like this:

  1. (Member #): 1111
    Friend of 5 members I am friends with: #1112, #1113, #1114, #1115, #1116

  2. (Member #): 1117
    Friend of 4 members I am friends with: #1118, #1119, #1120, #1121

  3. (Member #): 1122
    Friend of 3 members I am friends with: #1123, #1124, #1125

I thought it would make sense that people with a similar taste would be a good source for finding more people with the same taste.

Add tool for find people read same books

Please add a tool to find people that read or want-to-read similar books that I've read or want to read. Not based on the books' author, but the books themself.

[Feature request] More filters for friendrated

It would be nice to filter books output with flags of publishing year and no. of reads:
like 'books with less than 1000 ratings(/read) and (First)published between 1950 to 1980 and rated 4 or 5 stars by my friends'.

If someone can fix the login bug, post it here

The issues page is open again. If someone has the understanding to fix the login bug, please post it here, so we can copy in the changes. The original developer seems to no longer be active. The problem is in lib/Goodscrapes.pm, involving line 258. It somehow needs to be adapted to the new Goodreads design.
I'm looking forward to this working again.

Find good GR bookclubs

  • based on friends/followees list or reviewers etc = groups actually used
  • GR search function is pretty useless

Code accepts mistyped named shelves

When copied the name of the shelf from GR and pasted in Bash, for some reason Bash will truncate the last letter and the code get the name wrong. However, this never happens when I write the shelf name manualy. So I suspect this is a problem with Ubuntu on Windows, but I don't remember this happened before.

Anyway, in the case of writing the wrong shelf name, the code will load ALL books. I think better, to respond with an error "Book shelf is not available", to prevent the misleading of loading the results of ALL books and the user thinks it is for a certain bookshelf.

By the way, the code is terrific. For all my books (~440 books), it took around 3-4 days, and without a single network problem. Resulted in 1018 GR users with at least 5% of the authors in 70429298's shelf "%23ALL%23".

The cache helped a lot, when making other single shelf search or meta shelves (combination of 2 or more ) shelf search.

GR login via library currently broken

Goodreads seems to have changed their login process (probably different URLs, OpenID, etc. at first glance). If I find time, I'll take care of it. Unfortunately, I won't be able to do that in the short term. As a result, some programs are currently not working. When using only the library, many operations also work without a login, only then more slowly.

Don't cache page if prompting sign-in

Goodscrapes caches the sign-in page with url X if Goodreads is showing the sign-in screen instead of page X (because of an invalid cookie). You will not get any data until the file cache has been deleted. It's a show-stopper if this page is the user shelf or friends list (0 users).

Have to check if this is a HTTP 302 or just a message in page X.

friendrated: Output most signifcant instead of most faved books

via E-Mail:

I recently had the thought that the friendrated.pl script could be used to find books that are especially significant (not necessarily the ones with the most "Faves") amongst my friends by calculating
1 / ("Num GR Ratings" / "Faved)
so that a higher number would be better. I tried that in Excel and the results were excellent.

Can we make search on more than one bookshelf in a single run?

The current update of likeminded.pl is just amazing. I have re-run the search and getting with some books ~4000-5000 members. Your code is getting close to the ideal search that GR would have done if they would apply this feature using their database.

Also I feel the current update, solved an issue where every few hours the code was stopped saying there was no response from Goodreads or something like that. Now the current run for the last 24 hours never stopped. Maybe this is related to my pc or internet problem.

The current run will take ~7 days and not finished yet ( ALL books search for ~350 books), so I could not experiment whether there is an option to let the --shelf parameter include more than one shelf. Is that possible?

It seems another several search run with certain shelves combined is interesting too. With all books and a lot of them are very popular, the results are generic as you said.

In case you don't like the idea, the workaround is easy. I will create meta-shelves for major topics, and include the books of the other shelves related into those major topic shelves. But when I want to add a new book then I have to add the book manually into the meta-shelf beside adding it to its correct shelf. I wished if there was a way GR can add automatically all the assigned children shelves into a meta-shelf. If your code works with more than one shelf, then this meta-shelf is not necessary.

friendrated.pl returns only books I have already read, gets ratings wrong

I just ran friendrated.pl with no arguments, and it returned a list of books in common with my friends/followers only -- specifically, books on my read shelf. In addition, it showed me some books as faved by my friends that they did not rate 4 or 5 stars.

Running with -x read in the arguments reduces the number of results by about 3/4 but still shows me books from my read shelf only (the -x <shelfname> argument is meant to exclude books from the specified shelf).

Running for one of my friends with the -u <their id> argument returns only books that she, I, and her followers have on our read shelves.

My script for finding books by looking at bookshelves of people who read similar books

Love this toolbox. But it was missing a feature for finding books by looking at bookshelves of people who read similar books. So I wrote this small perl script for that today.

Here is how it works:

  • fetches books with 4 and 5 stars in your profile

  • crawls reviews of these books to find users who also rated it 4 or 5 stars

  • looks up the bookshelves of those users to see which books they rated 4 or 5 stars

  • ranks books based on number of votes from these users

  • also ranks users by number of books they have in common (min 3)

  • also gives more votes to users who love the same books as you but also hate the same books as you (i.e. 1 or 2 star)

Output:

  • Gives you a list of books who were rated highly by people who share the same taste as you
  • Gives you a list of doppelgangers, i.e. people who have rated books very similar to you.

My perl is a little rusty so this isn't the best way to do it but then perl motto is TIMTOWTDI and it did produce some good outputs.

Let me know what you guys think. Will post the script in the next comment.

friendrated: Most hated books among friends and followees

At the moment, the report includes books rated either 4 or more stars - most liked books.
It could be interesting to see the most hated books too - why?
Having a parameter so that the report only includes books rated 2 or less stars

make has failed for libcurl

Hi

Tried it on Ubuntu 14 and got this error:

perl -MCPAN -e 'install Cache::FileCache, WWW::Curl::Easy, Text::CSV, Log::Any, XML::Writer' || (echo "Please send your error messages to [email protected]" && false)
Reading '/home/haider/.cpan/Metadata'
  Database was generated on Fri, 22 Jun 2018 05:54:29 GMT
Cache::FileCache is up to date (undef).
Running install for module 'WWW::Curl::Easy'
Running make for S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz
Checksum for /home/haider/.cpan/sources/authors/id/S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz ok

  CPAN.pm: Building S/SZ/SZBALINT/WWW-Curl-4.17.tar.gz

Locating required external dependency bin:curl-config... missing.
Unresolvable missing external dependency.
Please install 'curl-config' seperately and try again.
NA: Unable to build distribution on this platform.
No 'Makefile' created'YAML' not installed, will not store persistent state
  SZBALINT/WWW-Curl-4.17.tar.gz
  /usr/bin/perl Makefile.PL INSTALLDIRS=site -- NOT OK
Running make test
  Make had some problems, won't test
Running make install
  Make had some problems, won't install
Text::CSV is up to date (1.95).
Log::Any is up to date (1.705).
XML::Writer is up to date (0.625).
Could not read metadata file. Falling back to other methods to determine prerequisites
chmod +x *.pl

a list of ALL dependencies is needed.

Ok, I used to be a Perl programmer, so I can slowly look at every script, one by one, and install each missing library. However, some people have never been Perl programmers.

A list of all dependencies is needed, and either an apt(itude) or cpan/cpanm or some such command line to install them all that a user can copy an paste into their terminal.

After running that command, each of these excellent - they are excellent - scripts should then run without complaint.

e.g. I tried cpanm, then aptitude, and finally google to find the Goodscrapes library. Ah, it's in the /lib/ subdir. Good, but a pointer would have been helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.