Love this toolbox. But it was missing a feature for finding books by looking at booksh

Hi San Kumar, thanks for sharing your . I will definitely check this out over th

<div class="highlight highlight-source-perl notranslate position-relative overflow-auto" dir="auto"

My script for finding books by looking at bookshelves of people who read similar books about goodreads-toolbox HOT 9 OPEN

andre-st commented on May 29, 2024

My script for finding books by looking at bookshelves of people who read similar books

from goodreads-toolbox.

Comments (9)

andre-st commented on May 29, 2024 1

Hi San Kumar, thanks for sharing your script. I will definitely check this out over the course of the next week.

from goodreads-toolbox.

san-kumar commented on May 29, 2024

#!/usr/bin/env perl

#<--------------------------------- MAN PAGE --------------------------------->|

=pod

=head1 NAME

bookfinder - finding books by looking at bookshelves of people who read similar books

=head1 PURPOSE

=over

=item * fetches books with 4 and 5 stars in your profile

=item * crawls reviews of these books to find users who also rated it 4 or 5 stars

=item * looks up the bookshelves of those users to see which books they rated 4 or 5 stars

=item * ranks books based on number of votes from these users

=item * also ranks users by number of books they have in common (min 3)

=item * also gives more votes to users who love the same books as you but also hate the same books as you get special treatment

=back

=head1 SYNOPSIS

B<bookfinder.pl>
[B<-n> F<number>]
[B<-a> F<number>] 
[B<-x> F<number>] 
[B<-d> F<filename>] 
[B<-u> F<number>] 
[B<-c> F<numdays>] 
[B<-o> F<filename>] 
[B<-s> F<shelfname> ...] 
[B<-i>]
F<goodloginmail> [F<goodloginpass>]


=head1 OPTIONS

Mandatory arguments to long options are mandatory for short options too.

=over 4

=item B<-n, --common>=F<number>

Max number of books in user's bookshelf. Currently set to
500. PEople who have hundreds and thousand of books often
add more noise than signal to your results.


=item B<-x, --rigor>=F<numlevel>

we need to find members who rate the books of our authors, 
though Goodreads just shows a few ratings. 
We exploit ratings filters and the reviews-search to find more members:

 level 1 = filters-based search of book-raters (max 5400 ratings) - default
 level 2 = like 1 plus dict-search if >3000 ratings with stall-time of 2min
 level n = like 1 plus dict-search with stall-time of n minutes

Rigor level 0 is useless here (latest readers only), 
and 2+ (dict-search) has a bad cost/benefit ratio given hundreds of books.


=item B<-d, --dict>=F<filename>

default is F<./list-in/dict.lst>


=item B<-u, --userid>=F<number>

check another member instead of the one identified by the login-mail 
and password arguments. You find the ID by looking at the shelf URLs.


=item B<-c, --cache>=F<numdays>

number of days to store and reuse downloaded data in F</tmp/FileCache/>,
default is 31 days. This helps with cheap recovery on a crash, power blackout 
or pause, and when experimenting with parameters. Loading data from Goodreads
is a very time consuming process.


=item B<-o, --outfile>=F<filename>

name of the CSV file where we write results to, default is
"./likeminded-F<goodusernumber>-F<shelfname>.csv"


=item B<-i, --ignore-errors>

Don't retry on errors, just keep going. 
Sometimes useful if a single Goodreads resource hangs over long periods 
and you're okay with some values missing in your result.
This option is not recommended when you run the program unattended.




=item B<-?, --help>

show full man page

=back


=head1 FILES

F<./list-in/dict.lst>

F<./list-out/likeminded-$USERID-$SHELF.html>

F</tmp/FileCache/>


=head1 EXAMPLES

$ ./bookfinder.pl [email protected] MyPASSword

$ ./bookfinder.pl -c 31 -o myfile.csv  [email protected] pass


=head1 REPORTING BUGS

Report bugs to <[email protected]> or use Github's issue tracker
L<https://github.com/andre-st/goodreads-toolbox/issues>


=head1 COPYRIGHT

This is free software. You may redistribute copies of it under the terms of
the GNU General Public License L<https://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.



=head1 VERSION

2020-01-23 (Since 2018-06-22)

=cut

#<--------------------------------- 79 chars --------------------------------->|


use strict;
use warnings qw(all);
use locale;
use 5.18.0;

# Perl core:
use FindBin;
use lib "$FindBin::Bin/lib/";
use Time::HiRes qw(time tv_interval);
use POSIX qw(strftime floor locale_h);
use File::Spec; # Platform indep. directory separator
use IO::File;
use Getopt::Long;
use Pod::Usage;
# Third party:
use Text::CSV;
# Ours:
use Goodscrapes;


# ----------------------------------------------------------------------------
# Program configuration:
#
setlocale(LC_CTYPE, "en_US"); # GR dates all en_US
STDOUT->autoflush(1);
gsetopt(cache_days => 31);

our $TSTART = time();
our $MINCOMMON = 5;
our $MAXAUBOOKS = 100;
our $RIGOR = 1;
our $MAXBOOKS = 500;
our $DICTPATH = File::Spec->catfile($FindBin::Bin, 'list-in', 'dict.lst');
our $OUTPATH;
our @SHELVES;
our $USERID;

GetOptions('rigor|x=i'          => \$RIGOR,
    'dict|d=s'           => \$DICTPATH,
    'userid|u=s'         => \$USERID,
    'outfile|o=s'        => \$OUTPATH,
    'maxbooks|n=s'       => \$MAXBOOKS,
    'shelf|s=s'          => \@SHELVES,
    'ignore-errors|i'    => sub {gsetopt(ignore_errors => 1);},
    'cache|c=i'          => sub {gsetopt(cache_days => $_[1]);},
    'help|?'             => sub {pod2usage(-verbose => 2);})
    or pod2usage(1);

pod2usage(1) if !$ARGV[0];

glogin(usermail => $ARGV[0], # Login also allows to load 200 books in 1 request
    userpass    => $ARGV[1], # Asks pw if omitted
    r_userid    => \$USERID);

sub bookshelf {
    my $id = shift;
    my %books;

    print "\nLooking bookshelf of $id..";

    greadshelf(from_user_id => $id,
        ra_from_shelves     => [ 'read' ],
        rh_into             => \%books,
        # on_book       => sub{},
        on_progress         => gmeter('books')
    );

    my (@good, @bad);
    for my $book_id (keys %books) {
        my $book = $books{$book_id};
        #next unless $book->{title} =~ /Club/;

        my $rating = $book->{user_rating};
        push(@good, $book) if ($rating >= 4);
        push(@bad, $book) if ($rating <= 2);

        #warn("cannot find rating for $book->{title} of $id\n") unless ($rating >= 1);
    }

    return (\@good, \@bad);
}

sub bookgenres {
    my $bid = shift;
    my $html = Goodscrapes::_html(Goodscrapes::_book_url($bid));
    my @genres;
    while ($html =~ m[href="/genres/([\w-]+)"]g) {
        push(@genres, $1);
    }

    return \@genres;
}

my ($su_good, $su_bad) = bookshelf($USERID);
my (%good_users, %good_books, %haters);

for my $b (@$su_good) {
    print "\nLooking up reviews for for $b->{title}..";
    $b->{reviews} = {};
    greadreviews(rh_for_book => $b,
        rh_into              => $b->{reviews},
        rigor                => $RIGOR,
        dict_path            => $DICTPATH,
        on_progress          => gmeter('memb'));

    for my $rev (values %{$b->{reviews}}) {
        my $u = $rev->{rh_user};
        if ($rev->{rating} >= 4) {
            $good_users{$u->{id}} = { 'votes' => (defined($good_users{$u->{id}}->{votes}) ? $good_users{$u->{id}}->{votes} : 0) + 1, 'user' => $u };
        } elsif ($rev->{rating} <= 2) {
            $haters{$u->{id}} = { 'votes' => (defined($haters{$u->{id}}->{votes}) ? $haters{$u->{id}}->{votes} : 0) + 1, 'user' => $u };
        }
    }
}

for my $u (keys %good_users) {
    $good_users{$u}->{'bad'} = defined($haters{$u}->{votes}) ? $haters{$u}->{votes} : 0;
}

printf("\nHere are your best users (out of %d users):\n", scalar keys %good_users);
my $filename = File::Spec->catfile($FindBin::Bin, 'list-out', "bookfinder-users.csv");
my $csv = Text::CSV->new({ binary => 1, eol => $/ }) or die "Failed to create a CSV handle: $!";
open my $fh, ">:encoding(utf8)", $filename or die "failed to create $filename: $!";

$csv->print($fh, [ 'uid', 'name', 'good_common', 'bad_common', 'total_common', 'total_books', 'ratio', 'url' ]);

for my $user_id (keys %good_users) {
    my $userHash = $good_users{$user_id};

    if (($user_id ne $USERID) && ($userHash->{votes} >= 2)) {
        my $user = $userHash->{user};
        my $uBooks = bookshelf($user_id);
        my $numBooks = scalar @$uBooks;

        if (!$MAXBOOKS || ($numBooks <= $MAXBOOKS)) {
            my $total = $userHash->{votes} + $userHash->{bad};
            $csv->print($fh, [ $user->{id}, $user->{name}, $userHash->{votes}, $userHash->{bad}, $total, $numBooks, $numBooks > 0 ? $total / $numBooks : 0, "https://www.goodreads.com/review/list/$user_id?sort=rating" ]);

            for my $gb (@$uBooks) {
                $good_books{$gb->{id}} = { 'votes' => (defined($good_books{$gb->{id}}->{votes}) ? $good_books{$gb->{id}}->{votes} : 0) + 1, 'book' => $gb };
            }
        } else {
            print "\nskipped books for $user_id: $numBooks > $MAXBOOKS\n";
        }
    }
}

close $fh or die "failed to close $filename: $!";

printf("\nHere are your best books (out of %d books):\n", scalar keys %good_books);
$OUTPATH = File::Spec->catfile($FindBin::Bin, 'list-out', "bookfinder-books.csv") if !$OUTPATH;

$csv = Text::CSV->new({ binary => 1, eol => $/ }) or die "Failed to create a CSV handle: $!";
open $fh, ">:encoding(utf8)", $OUTPATH or die "failed to create $OUTPATH: $!";

$csv->print($fh, [ 'bid', 'title', 'author', 'votes', 'avg_rating', 'num_ratings', 'genres', 'img_url' ]);

for my $bk (sort {$b->{votes} <=> $a->{votes}} values(%good_books)) {
    if ($bk->{votes} > 1) {
        my $b = $bk->{book};
        my $genres = bookgenres($b->{id});
        printf("%s with %d votes\n", $b->{title}, $bk->{votes});
        $csv->print($fh, [ $b->{id}, $b->{title}, $b->{rh_author}->{name}, $bk->{votes}, $b->{avg_rating}, $b->{num_ratings}, join(', ', @$genres), $b->{img_url} ]);
    }
}

close $fh or die "failed to close $OUTPATH: $!";

from goodreads-toolbox.

san-kumar commented on May 29, 2024

For this to work, there is a minor patch in Goodscrapes.pm line 2075:

$bk{ user_rating     } = $row =~            /data-rating="(\d+)"/                   ? ($1?$1:0) : 0;

I guess goodreads has changed the HTML so the user rating is always 0. The above line fixes it.

from goodreads-toolbox.

WaterSibilantFalling commented on May 29, 2024

Super like

from goodreads-toolbox.

mcleanle commented on May 29, 2024

This is exactly what I've been looking for! Can it be run in Docker?

from goodreads-toolbox.

san-kumar commented on May 29, 2024

This is exactly what I've been looking for! Can it be run in Docker?

I haven't tried it but shouldn't be so hard. Just modify goodreads-toolbox Dockerfile to copy this script to the container and the rest should be the same.

from goodreads-toolbox.

mcleanle commented on May 29, 2024

I added your script and the patch to the goodreads-toolbox directory and then modified the .dockerignore file to include the new script in the exceptions list, then rebuilt the container from my local drive instead of pointing to github in the build command. However, it seems to have broken my bash prompt and I get "no such file or directory" when trying to run any of the scripts in the container. Oh well! I'm not a Linux programmer and have never messed around with Docker before until today. I realize this isn't a Docker help forum, however if you happen to have any tips I would love to hear them. Thank you for your awesome work on this! I hope the toolbox will be supported again one day and this can be added as an official script.

from goodreads-toolbox.

san-kumar commented on May 29, 2024

I think your Dockerfile may be missing the entrypoint. I haven't tried this in docker yet, haven't seen the Dockerfile yet (will maybe check on the weekend) but you need to copy-paste the entry point from the original Dockerfile in to the modified file. Otoh you don't want to mess with Dockerfile then you can just mount a volume (with the -v command) and put this script there. Then use docker exec -it $pid bash to enter the container and just do a perl script.pl. Sorry I'm typing all this from memory so you may have to do some digging around but I reckon these should both work.

from goodreads-toolbox.

mcleanle commented on May 29, 2024

Thanks again for your help! For anyone who stumbles across this in future, here are all the steps I took to eventually get this working in Docker for Windows:

Clone the repo
Paste @san-kumar 's script into a new blank text file called bookfinder.pl
Replace line 205 with the following: use local::lib "$FindBin::Bin/lib/local/"; use lib "$FindBin::Bin/lib/";
Patch /lib/Goodscrapes.pm as @san-kumar mentions above
Add perl-text-csv \ at line 47 in Dockerfile
Add !/bookfinder.pl anywhere in .dockerignore
Open a command prompt and cd to the repo directory
Enter docker build -t goodreads-toolbox . and wait for the build to complete
Enter docker run -it --publish=8080:80 goodreads-toolbox
At the bash prompt, run perl bookfinder.pl

That's it!

from goodreads-toolbox.

My script for finding books by looking at bookshelves of people who read similar books about goodreads-toolbox HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent