viciouspotato / safaribooks Goto Github PK

Convert safaribooksonline ebook to epub and Kindle mobi format

Python 95.67% Dockerfile 4.33%

safaribooks's Introduction

THIS PROJECT IS DISCONTINUED, PLEASE CHECKOUT https://github.com/lorenzodifuccia/safaribooks WHICH IS MORE ACTIVELY MAITAINED AND AVAILABLE ON PYPI!

safaribooks

Convert safaribooksonline ebook to epub and Kindle mobi format

Usage

WARNING: use at your own risk.

Installation

If you want Kindle mobi output, download kindlegen from Amazon and put the binary somewhere in PATH.

If you only need epub books this step can be skipped.
Clone the safaribooks repo, let's say to safaribooks/ directory.
Make sure you have Python 2 installed (Tested version Python 2.7) then run:

cd safaribooks

pip install .

safaribooks is the folder you checkout the code.
Download a book (run this in the folder you checked out the code).

safaribooks -u USER/EMAIL -p PASSWORD -b BOOK_ID download-epub.

BOOK_ID is the id in url such as 9781617291920 in https://learning.oreilly.com/library/view/real-world-machine-learning/9781617291920/kindle_split_011.html.

An epub and a mobi file will be generated.

Installation with docker

With credit to @jmagnusson

This should work no matter platform you're on as no dependencies other than docker needs to be installed.

Run docker build -t safaribooks .
Run docker run -it --rm -v $(pwd)/converted:/app/converted safaribooks -u USER/EMAIL -p PASSWORD -b BOOK_ID download and wait for it to complete
The .epub and .mobi should now be in the folder converted of your current working directory.

Command line usage

usage: safaribooks [-h] [-o OUTPUT_DIRECTORY] [-u USER] [-p PASSWORD] [-c COOKIE]
                   [-b BOOK_ID]
                   {download-epub,download,convert-to-mobi} ...

Crawl Safari Books Online book content

positional arguments:
  {download-epub,download,convert-to-mobi}
    download-epub       Download as epub
    download            Download as epub, and convert to mobi
    convert-to-mobi     Convert existing epub file to mobi.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        Directory where converted files are located / should
                        be placed
  -u USER, --user USER  Safari Books Online user / e-mail address
  -p PASSWORD, --password PASSWORD
                        Safari Books Online password
  -b BOOK_ID, --book-id BOOK_ID
                        Safari Books Online book ID
  -c COOKIE, --cookie COOKIE
                        Safari Books Online Cookie. This filed can be retrieved by
                        using Chrome and copying request as curl. Cookie input should
                        be included in ''.
                        Example:
                        'BrowserCookie=XXXXXXXXXXXXXXXXXXXXX; sessionid=XXXXXXXXXXXXXXXXX;' etc.

Create an issue if you find it does not work!

safaribooks's People

Contributors

Stargazers

Watchers

safaribooks's Issues

DOWNLOAD_DELAY parameter seems not to be respected

Any change in parameter always defaults to 0.25

Some images are not picked up for specific book

Book reference number 9781509304608

It seems that https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781509304608 does not include the reference to ch02_images.xhtml as well as ch03.. and ch04... and that's why crawler does not pick them up.

Epub then has those errors (and the images are obviously not shown)

epub not downloaded (just title)

I try to get the book providing cookie (I am logged in browser with my company's SSO):

$ safaribooks -c 'BrowserCookie=0eb1e1a9-2f0f-4034-874f-b72f39f59682;SessionID=18ka8abjrrhd3myc5zljpmpvguscj2e0' -b 9781449340124 download-epub
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-04 15:57:50 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-04 15:57:50 [SafariBooks] INFO: Using `/tmp/tmpo4v1aG` as temporary directory
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 15:57:50 [scrapy.core.engine] INFO: Spider opened
2018-12-04 15:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 15:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-04 15:57:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-04 15:57:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: None)
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-04 15:58:00 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-04 15:58:00 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-04 15:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16754,
 'downloader/request_count': 30,
 'downloader/request_method_count/GET': 30,
 'downloader/response_bytes': 214326,
 'downloader/response_count': 30,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 4, 14, 58, 0, 688254),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 31,
 'log_count/INFO': 34,
 'memusage/max': 62570496,
 'memusage/startup': 62570496,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 30,
 'scheduler/dequeued/memory': 30,
 'scheduler/enqueued': 30,
 'scheduler/enqueued/memory': 30,
 'start_time': datetime.datetime(2018, 12, 4, 14, 57, 50, 251804)}
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 12K
drwxr-xr-x 2 chris chris 4.0K Dec  4 15:58 .
drwxr-xr-x 5 chris chris 4.0K Dec  4 15:58 ..
-rw-r--r-- 1 chris chris 2.7K Dec  4 15:58 Head_First_JavaScript_Programming-9781449340124.epub

The downloaded epub is very small 2.7kB.

It seems like only some metadata are downloaded but without any content.

Any hints?

thanks,
Chris

login failed error

i have created new username password. i am able to login on the web portal but same credentials giving login failed error via this utility.
username: [email protected]
password: test23

same credentials are working on web portal

not working : Error(kindlegen):E30005

getting error:

Error(kindlegen):E30005: Could not find file *.epub

Code Formatting and Styles

The script works great. But the programming examples present in the ebook does not have the syntax highlighting as present in the HTML format.

Is it a limitation?

error logging in

Hey!

i've tried to run few times but it fails to log in. My account is using my company SSO engine so the crawling fails with the error message:

2017-07-10 16:38:19 [SafariBooks] ERROR: Failed login
2017-07-10 16:38:19 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-10 16:38:19 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function close at 0x7f5165725b18>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 104, in close
    return closed(reason)
  File "/home/hpprszui/safaribooks/safaribook/spiders/safaribooks.py", line 106, in closed
    shutil.move(self.book_name + '.zip', self.book_name + '.epub')
  File "/usr/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '.zip'
2017-07-10 16:38:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2482,
 'downloader/request_count': 4,
 'downloader/request_method_count/GET': 3,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 22191,
 'downloader/response_count': 4,
 'downloader/response_status_count/200': 2,
 'downloader/response_status_count/302': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 7, 10, 14, 38, 19, 632179),
 'log_count/DEBUG': 5,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'memusage/max': 50954240,
 'memusage/startup': 50954240,
 'request_depth_max': 1,
 'response_received_count': 2,
 'scheduler/dequeued': 4,
 'scheduler/dequeued/memory': 4,
 'scheduler/enqueued': 4,
 'scheduler/enqueued/memory': 4,
 'start_time': datetime.datetime(2017, 7, 10, 14, 38, 13, 71128)}

Do you have any idea how to override that?

No form element found in 200 https://www.oreilly.com

2018-11-29 14:46:52 [scrapy.core.engine] INFO: Spider opened
2018-11-29 14:46:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 14:46:52 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-11-29 14:46:53 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.oreilly.com> from <GET https://www.safaribooksonline.com/>
2018-11-29 14:46:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.oreilly.com> (referer: None)
2018-11-29 14:46:54 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.oreilly.com> (referer: None)
Traceback (most recent call last):

File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 587, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/usr/local/lib/python2.7/site-packages/safaribooks/spiders/safaribooks.py", line 103, in parse
callback=self.after_login
File "/usr/local/lib/python2.7/site-packages/scrapy/http/request/form.py", line 48, in from_response
form = _get_form(response, formname, formid, formnumber, formxpath)
File "/usr/local/lib/python2.7/site-packages/scrapy/http/request/form.py", line 77, in _get_form
raise ValueError("No
element found in %s" % response)

ValueError: No element found in <200 https://www.oreilly.com>
2018-11-29 14:46:55 [scrapy.core.engine] INFO: Closing spider (finished)
2018-11-29 14:46:55 [SafariBooks] INFO: Did not even got toc, ignore generated file operation.
2018-11-29 14:46:55 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 438,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 8519,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/301': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 11, 29, 7, 46, 55, 95761),
'log_count/DEBUG': 3,
'log_count/ERROR': 1,
'log_count/INFO': 9,
'memusage/max': 48963584,
'memusage/startup': 48963584,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'spider_exceptions/ValueError': 1,
'start_time': datetime.datetime(2018, 11, 29, 7, 46, 52, 397288)}
2018-11-29 14:46:55 [scrapy.core.engine] INFO: Spider closed (finished)

ValueError: No <form> element found in <200 https://www.oreilly.com/member/login/?

2019-05-16 10:54:49 [scrapy.core.engine] INFO: Spider opened
2019-05-16 10:54:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-05-16 10:54:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-05-16 10:54:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com>
2019-05-16 10:54:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/login/unified/?next=/home/> from <GET https://learning.oreilly.com/accounts/login/>
2019-05-16 10:54:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://api.oreilly.com/api/v1/auth/openid/authorize?response_type=code&client_id=235442&state=mqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs&redirect_uri=https://learning.oreilly.com/complete/unified/&login_context=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA&scope=openid+profile+email> from <GET https://learning.oreilly.com/login/unified/?next=/home/>
2019-05-16 10:54:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.oreilly.com/member/login/?next=%2Fapi%2Fv1%2Fauth%2Fopenid%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3D235442%26state%3DmqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs%26redirect_uri%3Dhttps%3A%2F%2Flearning.oreilly.com%2Fcomplete%2Funified%2F%26login_context%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA%26scope%3Dopenid%2Bprofile%2Bemail&locale=en> from <GET https://api.oreilly.com/api/v1/auth/openid/authorize?response_type=code&client_id=235442&state=mqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs&redirect_uri=https://learning.oreilly.com/complete/unified/&login_context=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA&scope=openid+profile+email>
2019-05-16 10:54:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.oreilly.com/member/login/?next=%2Fapi%2Fv1%2Fauth%2Fopenid%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3D235442%26state%3DmqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs%26redirect_uri%3Dhttps%3A%2F%2Flearning.oreilly.com%2Fcomplete%2Funified%2F%26login_context%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA%26scope%3Dopenid%2Bprofile%2Bemail&locale=en> (referer: None)
**2019-05-16 10:54:53 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.oreilly.com/member/login/?**next=%2Fapi%2Fv1%2Fauth%2Fopenid%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3D235442%26state%3DmqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs%26redirect_uri%3Dhttps%3A%2F%2Flearning.oreilly.com%2Fcomplete%2Funified%2F%26login_context%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA%26scope%3Dopenid%2Bprofile%2Bemail&locale=en> (referer: None)
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 587, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Library/Python/2.7/site-packages/safaribooks/spiders/safaribooks.py", line 106, in parse
callback=self.after_login
File "/Library/Python/2.7/site-packages/scrapy/http/request/form.py", line 48, in from_response
form = _get_form(response, formname, formid, formnumber, formxpath)
File "/Library/Python/2.7/site-packages/scrapy/http/request/form.py", line 77, in _get_form
raise ValueError("No

element found in %s" % response)
**ValueError: No element found in <200 https://www.oreilly.com/member/login/?**next=%2Fapi%2Fv1%2Fauth%2Fopenid%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3D235442%26state%3DmqBlgigI7WGPpgRn9Ba2kopQiuEAbLgs%26redirect_uri%3Dhttps%3A%2F%2Flearning.oreilly.com%2Fcomplete%2Funified%2F%26login_context%3DeyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpbWFnZSI6Imljb25fYWxlcnRfdGVhbC5wbmciLCJtZXNzYWdlIjoiTG9naW4gaXMgbm93IHVuaWZpZWQgYWNyb3NzIE8nUmVpbGx5LiBQbGVhc2UgdXNlIHlvdXIgTydSZWlsbHkgY3JlZGVudGlhbHMgdG8gYWNjZXNzIHlvdXIgT25saW5lIExlYXJuaW5nIGFjY291bnQuIiwibGlua3MiOnsic2lnbl91cCI6eyJ0ZXh0IjoiU3RhcnQgYSBmcmVlIHRyaWFsLiIsImxpbmsiOiJodHRwczovL2xlYXJuaW5nLm9yZWlsbHkuY29tL3JlZ2lzdGVyLyJ9fX0.Nh-qjdUCam7vmBES1j5EKu3cLQMExW_mI66N-VISAM6Q5IWO85Rjk1qXjYFC_lszIam4JZiDt5hXXrW0JZvu-QHej5uveFyWBRxzwMJ9p9i5fMRrF7Z5xsV27ku252-3yVzH7rMsjuRjOP8xVcNZTpOg1a4eK9H-I0NSxCEnTL8UQl4FxuW2d9OAsFW6jMxwVNyxTBbsBVXBncGrcla-b1XSY0ndWWqhfds7g3AqAL2BjlfI-4yKkY3Zu-romtDL2mxwqfM_yO9JGbpr6D3ScDS6k9DySojDaXyZBTIPSbLTemwuQUmcy_VPbYwokNZ4GECg4BRD0W11r0L-090bAA%26scope%3Dopenid%2Bprofile%2Bemail&locale=en>
2019-05-16 10:54:53 [scrapy.core.engine] INFO: Closing spider (finished)
2019-05-16 10:54:53 [SafariBooks] INFO: Did not even got toc, ignore generated file operation.
2019-05-16 10:54:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 3243,
'downloader/request_count': 5,
'downloader/request_method_count/GET': 5,
'downloader/response_bytes': 13878,
'downloader/response_count': 5,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 4,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 5, 16, 3, 54, 53, 229169),
'log_count/DEBUG': 5,
'log_count/ERROR': 1,
'log_count/INFO': 11,
'memusage/max': 48705536,
'memusage/startup': 48705536,
'response_received_count': 1,
'scheduler/dequeued': 5,
'scheduler/dequeued/memory': 5,
'scheduler/enqueued': 5,
'scheduler/enqueued/memory': 5,
'spider_exceptions/ValueError': 1,
'start_time': datetime.datetime(2019, 5, 16, 3, 54, 49, 162945)}
2019-05-16 10:54:53 [scrapy.core.engine] INFO: Spider closed (finished)

TabError when running via Docker

hi, I tried running the docker command on README and get this stacktrace:

Traceback (most recent call last):
  File "/usr/local/bin/safaribooks", line 11, in <module>
    load_entry_point('safaribooks', 'console_scripts', 'safaribooks')()
  File "/app/safaribooks/__main__.py", line 121, in main
    args.func(args)
  File "/app/safaribooks/__main__.py", line 55, in download
    download_epub(args)
  File "/app/safaribooks/__main__.py", line 21, in download_epub
    process = CrawlerProcess(get_project_settings())
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 249, in __init__
    super(CrawlerProcess, self).__init__(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 137, in __init__
    self.spider_loader = _get_spider_loader(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 336, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 61, in from_settings
    return cls(settings)
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 25, in __init__
    self._load_all_spiders()
  File "/usr/local/lib/python3.6/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
    for module in walk_modules(name):
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 63, in walk_modules
    mod = import_module(path)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/safaribooks/spiders/__init__.py", line 6, in <module>
    from .safaribooks import SafariBooksSpider  # noqa
  File "/app/safaribooks/spiders/safaribooks.py", line 134
    style_sheets = page_json.get('stylesheets', [])
                                                  ^
TabError: inconsistent use of tabs and spaces in indentation

Any ideas why this is happening?

Publish to Docker Hub

Would make this project just a little bit more easy to access.

Should be done with Create > Automated Build.

Let me know if you need some help fixing this. I could do it if I'd get access to the repo.

book_id should not be an int because it can have a leading zero

safaribooks/safaribooks/__main__.py

Lines 78 to 83 in 8ae51af

 parser.add_argument( 

 '-b', 

 '--book-id', 

 type=int, 

 help='Safari Books Online book ID', 

 )

For example when the book ID is 0596528124 or "0596528124", the leading 0 is dropped when casting as int. This results in a 404:

2018-01-25 18:26:01 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=596528124> (referer: https://www.safaribooksonline.com/home/)

The correct URL has a leading 0 in book_id:
https://www.safaribooksonline.com/nest/epub/toc/?book_id=0596528124

Not Working Anymore

Many status codes issues.

HTTP status code is not handled or not allowed

get book id: 9781492034131, cause error

File "/Library/Python/2.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
    for x in result:
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Library/Python/2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/Library/Python/2.7/site-packages/safaribooks/spiders/safaribooks.py", line 179, in parse_page
    fh.write(template.render(body=body, style=style))
  File "/Library/Python/2.7/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/Library/Python/2.7/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "<template>", line 9, in top-level template code
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 13676: ordinal not in range(128)

ERROR: Spider error processing <GET https://www.safaribooksonline.com/> (referer: None)

2018-11-12 20:26:20 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-11-12 20:26:20 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 3.6.5 (default, Jul  3 2018, 18:38:05) - [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i  14 Aug 2018), cryptography 2.4.1, Platform Darwin-18.2.0-x86_64-i386-64bit
2018-11-12 20:26:20 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'safaribooks', 'DOWNLOAD_DELAY': 0.25, 'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders']}
2018-11-12 20:26:20 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2018-11-12 20:26:20 [SafariBooks] INFO: Using `/var/folders/f6/7jyf72pj343clct4dwq4fq940000gn/T/tmp24r1td0k` as temporary directory
2018-11-12 20:26:20 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-12 20:26:20 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-12 20:26:20 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-12 20:26:20 [scrapy.core.engine] INFO: Spider opened
2018-11-12 20:26:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-12 20:26:20 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-11-12 20:26:25 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/> (referer: None)
2018-11-12 20:26:25 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.safaribooksonline.com/> (referer: None)
Traceback (most recent call last):
  File "/Users/Rocky/.pyenv/versions/3.6.5/lib/python3.6/site-packages/twisted/internet/defer.py", line 587, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/Rocky/.pyenv/versions/3.6.5/lib/python3.6/site-packages/safaribooks/spiders/safaribooks.py", line 91, in parse
    cookies = dict(x.strip().split('=') for x in self.cookie.split(';'))
ValueError: dictionary update sequence element #0 has length 1; 2 is required
2018-11-12 20:26:26 [scrapy.core.engine] INFO: Closing spider (finished)
2018-11-12 20:26:26 [SafariBooks] INFO: Did not even got toc, ignore generated file operation.
2018-11-12 20:26:26 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 224,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 11368,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 11, 13, 4, 26, 26, 84559),
 'log_count/DEBUG': 2,
 'log_count/ERROR': 1,
 'log_count/INFO': 9,
 'memusage/max': 52514816,
 'memusage/startup': 52514816,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'spider_exceptions/ValueError': 1,
 'start_time': datetime.datetime(2018, 11, 13, 4, 26, 20, 686228)}
2018-11-12 20:26:26 [scrapy.core.engine] INFO: Spider closed (finished)

it fails. Any ideas? running python 3.6

ERROR: Error downloading <GET https://www.safaribooksonline.com/>

I'm trying to download a book but I'm getting this error. Any ideas?

2017-07-26 17:07:57 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: safaribook)
2017-07-26 17:07:57 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribook.spiders', 'SPIDER_MODULES': ['safaribook.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribook'}
2017-07-26 17:07:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2017-07-26 17:07:57 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-07-26 17:07:57 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-07-26 17:07:57 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-07-26 17:07:57 [scrapy.core.engine] INFO: Spider opened
2017-07-26 17:07:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-07-26 17:07:57 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-07-26 17:07:57 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.safaribooksonline.com/>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 63, in download_request
    return agent.download_request(request)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/handlers/http11.py", line 300, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1633, in request
    endpoint = self._getEndpoint(parsedURI)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1617, in _getEndpoint
    return self._endpointFactory.endpointForURI(uri)
  File "/usr/local/lib/python2.7/dist-packages/twisted/web/client.py", line 1494, in endpointForURI
    uri.port)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/contextfactory.py", line 59, in creatorForNetloc
    return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/contextfactory.py", line 56, in getContext
    return self.getCertificateOptions().getContext()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/downloader/contextfactory.py", line 51, in getCertificateOptions
    acceptableCiphers=DEFAULT_CIPHERS)
  File "/usr/local/lib/python2.7/dist-packages/twisted/python/deprecate.py", line 792, in wrapped
    return wrappee(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py", line 1583, in __init__
    self._options |= SSL.OP_SINGLE_DH_USE | SSL.OP_SINGLE_ECDH_USE
AttributeError: 'module' object has no attribute 'OP_SINGLE_ECDH_USE'
2017-07-26 17:07:57 [scrapy.core.engine] INFO: Closing spider (finished)
2017-07-26 17:07:57 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function close at 0x7fd5798c9938>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/__init__.py", line 104, in close
    return closed(reason)
  File "/home/fpoumian/Code/safaribooks/safaribook/spiders/safaribooks.py", line 106, in closed
    shutil.move(self.book_name + '.zip', self.book_name + '.epub')
  File "/usr/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '.zip'
2017-07-26 17:07:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/exceptions.AttributeError': 1,
 'downloader/request_bytes': 227,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 7, 26, 23, 7, 57, 664761),
 'log_count/DEBUG': 1,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'memusage/max': 68534272,
 'memusage/startup': 68534272,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 7, 26, 23, 7, 57, 352402)}
2017-07-26 17:07:57 [scrapy.core.engine] INFO: Spider closed (finished)

*************************************************************
 Amazon kindlegen(Linux) V2.9 build 1028-0897292 
 A command line e-book compiler 
 Copyright Amazon.com and its Affiliates 2014 
*************************************************************

Error(kindlegen):E30005: Could not find file  *.epub

invalid choice

safaribooks: error: invalid choice: "sessionid=xxxxxxxxxxxxxxxxxxxxxx;'" (choose from 'download-epub', 'download', 'convert-to-mobi')

my input:
safaribooks -o 'C:\Users\xxx\Downloads' -c 'BrowserCookie=xxxxxxxxxxxxxxxxxxxxxxxxx; sessionid=xxxxxxxxxxxxxx;' -b 9781118026694 download

ERROR: Spider error processing <GET

2018-03-02 22:15:34 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.safaribooksonline.com/api/v1/book/9781449342562/chapter-content/ch01.html> (referer: https://www.safaribooksonline.com//api/v1/book/9781449342562/chapter/ch01.html)
Traceback (most recent call last):
File "c:\python27\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
for x in result:
File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "c:\python27\lib\site-packages\safaribooks\spiders\safaribooks.py", line 179, in parse_page
fh.write(template.render(body=body, style=style))
File "c:\python27\lib\site-packages\jinja2\environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "c:\python27\lib\site-packages\jinja2\environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "", line 9, in top-level template code

Not working either

2017-11-15 22:01:27 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: safaribook)
2017-11-15 22:01:27 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribook.spiders', 'SPIDER_MODULES': ['safaribook.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribook'}
2017-11-15 22:01:27 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2017-11-15 22:01:27 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-11-15 22:01:27 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-11-15 22:01:27 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-11-15 22:01:27 [scrapy.core.engine] INFO: Spider opened
2017-11-15 22:01:27 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-11-15 22:01:27 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-11-15 22:01:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/> (referer: None)
2017-11-15 22:01:29 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/home/> from <POST https://www.safaribooksonline.com/accounts/login/>
2017-11-15 22:01:30 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/home/> (referer: https://www.safaribooksonline.com/)
2017-11-15 22:01:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=install> (referer: https://www.safaribooksonline.com/home/)
2017-11-15 22:01:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com/nest/epub/toc/?book_id=install>: HTTP status code is not handled or not allowed
2017-11-15 22:01:32 [scrapy.core.engine] INFO: Closing spider (finished)
2017-11-15 22:01:32 [scrapy.utils.signal] ERROR: Error caught on signal handler: <function close at 0x10415cb18>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/twisted/internet/defer.py", line 149, in maybeDeferred
result = f(*args, **kw)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pydispatch/robustapply.py", line 55, in robustApply
return receiver(*arguments, **named)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/spiders/init.py", line 104, in close
return closed(reason)
File "/Users/grady/Documents/git/safaribooks/safaribook/spiders/safaribooks.py", line 112, in closed
shutil.move(self.book_name + '.zip', self.book_title + '-' + self.bookid + '.epub')
AttributeError: 'SafariBooksSpider' object has no attribute 'book_title'
2017-11-15 22:01:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 2074,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 3,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 53803,
'downloader/response_count': 4,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/302': 1,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 11, 16, 3, 1, 32, 342177),
'httperror/response_ignored_count': 1,
'httperror/response_ignored_status_count/404': 1,
'log_count/DEBUG': 5,
'log_count/ERROR': 1,
'log_count/INFO': 8,
'memusage/max': 52678656,
'memusage/startup': 52678656,
'request_depth_max': 2,
'response_received_count': 3,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'start_time': datetime.datetime(2017, 11, 16, 3, 1, 27, 332446)}
2017-11-15 22:01:32 [scrapy.core.engine] INFO: Spider closed (finished)

401 errors

Running it as:

docker run -it --rm -v $(pwd)/converted:/app/converted:Z safaribooks -u user -p pass -b 9781492034131 download

outputs a lot of 401 errors and the result is a bad epub file where the metadata is ok but there is no content (<1mb)

2018-08-04 08:39:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/> (referer: None)
2018-08-04 08:39:16 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.safaribooksonline.com/accounts/login/> (referer: https://www.safaribooksonline.com/)
2018-08-04 08:39:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-08-04 08:39:18 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:18 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:18 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:18 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part02.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:18 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:19 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com//library/cover/9781492034131/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:19 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part01.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:19 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:19 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/preface01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:19 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part04.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:19 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/preface01.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:20 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:20 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:20 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:21 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:21 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:21 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:21 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:22 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part03.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:22 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:22 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:22 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch20.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch20.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:23 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:23 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch19.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch19.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:23 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:23 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch18.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:23 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part06.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:24 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch18.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:24 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:24 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/part05.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:25 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch17.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch17.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:25 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch16.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch16.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:25 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch15.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:25 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch15.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:26 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch14.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:26 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch14.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:26 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:27 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781492034131)
2018-08-04 08:39:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/9781492034131/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-08-04 08:39:27 [scrapy.core.engine] INFO: Closing spider (finished)

I've tested it with a few more IDs but non of them works so far.

Thanks

command not found

./crawl.sh username my password bookid
./crawl.sh: line 8: scrapy: command not found
./crawl.sh: line 9: kindlegen: command not found
im using ubuntu 17
if i type this
wahhaj@wahhaj:~/Desktop/safaribooks/safaribooks$ ./crawl.sh
Please give a pure digit book id, is not an acceptable book id.
error i have check the book id 3 times it doesnt work

Could not find a version that satisfies the requirement ...

Hi,
I have Python 2.7 installed.

When I run pip install safaribooks I get the following:

Could not find a version that satisfies the requirement safaribooks (from versions: ) No matching distribution found for safaribooks

Any idea how to fix that and be able to install safaribooks locally?

Thanks
Bilal

Support for Safaribooks Through Proquest

My library Provides access to Safari Books through ProQuest/EZproxy using SSO sessions

It would be great to add support to this and I will be happy to provide an account for testing.

Thanks in advance

ImportError: no module named win32api

I tried running this application using the latest windows 10 version. If you need more information please ask.

2018-06-21 22:06:11 [twisted] CRITICAL: Unhandled error in Deferred:


Traceback (most recent call last):
  File "c:\python27\lib\site-packages\safaribooks\__main__.py", line 28, in download_epub
    output_directory=args.output_directory
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 171, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 175, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 1331, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlineCallbacks
    result = g.send(result)
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 98, in crawl
    six.reraise(*exc_info)
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 80, in crawl
    self.engine = self._create_engine()
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 105, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "c:\python27\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "c:\python27\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "c:\python27\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "c:\python27\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "c:\python27\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
    mod = import_module(module)
  File "c:\python27\lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
  File "c:\python27\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line 20, in <module>
    from twisted.web.client import ResponseFailed
  File "c:\python27\lib\site-packages\twisted\web\client.py", line 42, in <module>
    from twisted.internet.endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
  File "c:\python27\lib\site-packages\twisted\internet\endpoints.py", line 34, in <module>
    from twisted.internet.stdio import StandardIO, PipeAddress
  File "c:\python27\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
    from twisted.internet import _win32stdio
  File "c:\python27\lib\site-packages\twisted\internet\_win32stdio.py", line 7, in <module>
    import win32api
exceptions.ImportError: No module named win32api
2018-06-21 22:06:11 [twisted] CRITICAL:
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 1185, in _inlineCallbacks
    result = g.send(result)
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 98, in crawl
    six.reraise(*exc_info)
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 80, in crawl
    self.engine = self._create_engine()
  File "c:\python27\lib\site-packages\scrapy\crawler.py", line 105, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "c:\python27\lib\site-packages\scrapy\core\engine.py", line 69, in __init__
    self.downloader = downloader_cls(crawler)
  File "c:\python27\lib\site-packages\scrapy\core\downloader\__init__.py", line 88, in __init__
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
  File "c:\python27\lib\site-packages\scrapy\middleware.py", line 58, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "c:\python27\lib\site-packages\scrapy\middleware.py", line 34, in from_settings
    mwcls = load_object(clspath)
  File "c:\python27\lib\site-packages\scrapy\utils\misc.py", line 44, in load_object
    mod = import_module(module)
  File "c:\python27\lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
  File "c:\python27\lib\site-packages\scrapy\downloadermiddlewares\retry.py", line 20, in <module>
    from twisted.web.client import ResponseFailed
  File "c:\python27\lib\site-packages\twisted\web\client.py", line 42, in <module>
    from twisted.internet.endpoints import TCP4ClientEndpoint, SSL4ClientEndpoint
  File "c:\python27\lib\site-packages\twisted\internet\endpoints.py", line 34, in <module>
    from twisted.internet.stdio import StandardIO, PipeAddress
  File "c:\python27\lib\site-packages\twisted\internet\stdio.py", line 30, in <module>
    from twisted.internet import _win32stdio
  File "c:\python27\lib\site-packages\twisted\internet\_win32stdio.py", line 7, in <module>
    import win32api
ImportError: No module named win32api

codec can't encode character u'\xae' (book id 9780133040036)

as the subject states... script is failing with codec on book id 9780133040036

y r u so awesome?

such awesome that blows my socks off :)

403 when fetching chapters from API

I am currently trying one book but I get a 403 error and when I open it in the browser while still logged in I get You do not have permission to perform this action.

2019-03-03 13:34:56 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: safaribooks)
2019-03-03 13:34:56 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 16.4.1, Python 3.6.8 (default, Jan 30 2019, 23:54:38) - [GCC 6.4.0], pyOpenSSL 19.0.0 (OpenSSL 1.0.2q  20 Nov 2018), cryptography 2.6.1, Platform Linux-4.15.0-45-generic-x86_64-with
2019-03-03 13:34:56 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'safaribooks', 'DOWNLOAD_DELAY': 0.25, 'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders']}
2019-03-03 13:34:56 [scrapy.extensions.telnet] INFO: Telnet Password: ba131cc2f5422341
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2019-03-03 13:34:56 [SafariBooks] INFO: Using `/tmp/tmp6l2jbpbq` as temporary directory
2019-03-03 13:34:56 [scrapy.core.downloader.handlers] ERROR: Loading "scrapy.core.downloader.handlers.ftp.FTPDownloadHandler" for scheme "ftp"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/handlers/__init__.py", line 48, in _load_handler
    dhcls = load_object(path)
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/handlers/ftp.py", line 36, in <module>
    from twisted.protocols.ftp import FTPClient, CommandFailed
ModuleNotFoundError: No module named 'twisted.protocols.ftp'
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-03-03 13:34:56 [scrapy.core.engine] INFO: Spider opened
2019-03-03 13:34:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-03-03 13:34:56 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-03-03 13:34:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/>
2019-03-03 13:34:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2019-03-03 13:34:58 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/home/> from <POST https://learning.oreilly.com/accounts/login/>
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/home/> (referer: https://learning.oreilly.com/accounts/login/)
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681> (referer: https://learning.oreilly.com/home/)
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch06.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:34:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch06.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:00 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch05.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch05.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:00 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch03.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch03.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch04.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch04.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch02.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch02.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/preface.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/preface.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/foreword.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/foreword.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/toc.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/toc.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/dedication.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/dedication.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/copy.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/copy.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/title.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/title.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm02.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm02.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/pref00.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/pref00.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/cover.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/library/cover/9780134757681/> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/cover.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01_images.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01_images.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/index.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm03.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/biblo.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/index.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm03.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/biblo.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch12.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch12.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch11.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch11.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch10.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch10.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch09.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch09.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch08.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch08.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch07.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch07.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] INFO: Closing spider (finished)
2019-03-03 13:35:07 [SafariBooks] INFO: Made archive /app/refactoring-improving-the.zip
2019-03-03 13:35:07 [SafariBooks] INFO: Moving /app/refactoring-improving-the.zip to download/Refactoring__Improving_the_Design_of_Existing_Code-9780134757681.epub
2019-03-03 13:35:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 17319,
 'downloader/request_count': 31,
 'downloader/request_method_count/GET': 30,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 63071,
 'downloader/response_count': 31,
 'downloader/response_status_count/200': 4,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/403': 25,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 3, 3, 13, 35, 7, 954511),
 'httperror/response_ignored_count': 25,
 'httperror/response_ignored_status_count/403': 25,
 'log_count/DEBUG': 31,
 'log_count/ERROR': 1,
 'log_count/INFO': 37,
 'memusage/max': 55988224,
 'memusage/startup': 55988224,
 'request_depth_max': 3,
 'response_received_count': 29,
 'scheduler/dequeued': 31,
 'scheduler/dequeued/memory': 31,
 'scheduler/enqueued': 31,
 'scheduler/enqueued/memory': 31,
 'start_time': datetime.datetime(2019, 3, 3, 13, 34, 56, 344411)}
2019-03-03 13:35:07 [scrapy.core.engine] INFO: Spider closed (finished)

File is not opened in binary mode, which lead to some data not able to be saved.

File is not opened in binary mode, which lead to some data not able to be saved. As the result, in epub version some page can not navigate to the next.

safaribooks/safaribooks/spiders/safaribooks.py

Line 118 in a95db44

with open(cover_img_path, 'w') as fh:

change to fix the issue:
with open(cover_img_path, 'wb') as fh

Cant install it!

Hi, I don't know what am I doing wrong but when I run the command "pip install safaribooks" on a freshly installed copy of python 2.7 (python-2.7.14) on a Win10 machine I get the message:

Collecting safaribooks
Could not find a version that satisfies the requirement safaribooks (from versions: )
No matching distribution found for safaribooks

What could be the issue that prevents me from installing this software?

Thanks.

Support for SSO accounts

It would be great if this supported SSO accounts.
https://www.safaribooksonline.com/enterprise/

Error: 'str' object has no attribute 'decode'

I just ran this through Docker and got this error message all across the log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
    for x in result:
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/app/safaribooks/spiders/safaribooks.py", line 177, in parse_page
    body = str(BeautifulSoup(response.body, 'lxml').find('body')).decode('utf8')
AttributeError: 'str' object has no attribute 'decode'

Redirecting (302) when downloading images for book

Bookid = 9781935504634

DEBUG ERRORS
2017-12-07 21:31:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/static/images/default_cover.9943b5969cc4.png> from <GET https://www.safaribooksonline.com//library/view/enterprise-architecture-made/9781935504634/images/EnterpriseArchitecture_fmt5.png>
2017-12-07 21:31:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/static/images/default_cover.9943b5969cc4.png> from <GET https://www.safaribooksonline.com//library/view/enterprise-architecture-made/9781935504634/images/EnterpriseArchitectur_fmt51.png>

Can not crawl from safaribooksonline http 404. Double // in http://xxx//api/v1

Sort author field defaults to Douglas Crockford

safaribooks/safaribooks/data/OEBPS/content.opf

Line 11 in 9ab6b91

 <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf" opf:file-as="Douglas Crockford">{{info["authors"]}}</dc:creator> 

After Downloading Book is in Epub Format it given an Error , I gave parameter Download

018-06-17 02:19:37 [SafariBooks] INFO: Moving C:\Users\Abhishek\Downloads\Compressed\safaribooks-master\safaribooks\pandas-for-everyone.zip to C:\Users\Abhishek\Downloads\Compressed\safaribooks-master\safaribooks\converted\Pandas_for_Everyone__Python_Data_Analysis,First_Edition-9780134547046.epub
2018-06-17 02:19:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 132017,
'downloader/request_count': 202,
'downloader/request_method_count/GET': 201,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 7168303,
'downloader/response_count': 202,
'downloader/response_status_count/200': 201,
'downloader/response_status_count/302': 1,
'dupefilter/filtered': 55,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 6, 16, 20, 49, 35, 794000),
'log_count/DEBUG': 1745,
'log_count/INFO': 12,
'request_depth_max': 5,
'response_received_count': 201,
'scheduler/dequeued': 202,
'scheduler/dequeued/memory': 202,
'scheduler/enqueued': 202,
'scheduler/enqueued/memory': 202,
'start_time': datetime.datetime(2018, 6, 16, 20, 47, 32, 366000)}
2018-06-17 02:19:37 [scrapy.core.engine] INFO: Spider closed (finished)
C:\Users\Abhishek\Downloads\Compressed\safaribooks-master\safaribooks\converted
Traceback (most recent call last):
File "C:\Python27\Scripts\safaribooks-script.py", line 11, in
load_entry_point('safaribooks==0.1.1', 'console_scripts', 'safaribooks')()
File "c:\python27\lib\site-packages\safaribooks_main.py", line 121, in main
args.func(args)
File "c:\python27\lib\site-packages\safaribooks_main_.py", line 60, in download
convert_to_mobi(args)
File "c:\python27\lib\site-packages\safaribooks_main_.py", line 51, in convert_to_mobi
subprocess.call(['kindlegen', path])
File "c:\python27\lib\subprocess.py", line 168, in call
return Popen(*popenargs, **kwargs).wait()
File "c:\python27\lib\subprocess.py", line 390, in init
errread, errwrite)
File "c:\python27\lib\subprocess.py", line 640, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified

generated epub triggers render bug on kobo hardware

Generated an epub and loaded it onto my kobo and it displayed using a gigantic font (somewhere around 90pts) with large spacing that only displayed a couple of letters per page. Changing the font and size in the kobo UI do not affect the display.

Documenting here for any other Kobo user(s). I was able to use calibre to convert from epub to epub and it generated an epub that worked fine on the kobo.

This is a problem on the Kobo side, but it'd be nice to find a work around in the CSS. A previous kobo epub rendering problem was "fixed" by deleting custom.css from the epub bundle.

Hopefully I'll get back to this and run some more tests with various tweaks to the inline css to generate a working set. Perhaps using the CSS referenced in the chapter metadata.

ereader: kobo aura one
target action: download_epub
book tested:
- name: Data and Reality
- id: 9781935504214

Docker version doesn't copy file to local machine

It looks like the docker version does generate the files, but doesn't copy to the "converted" folder. Not sure how to actually get the files off the docker vm.

true not define + small improvement

Hi,

I received an error about true not being defined so I added it in ./safaribook/spiders/safaribooks.py just after false:

Line 16 false = False
Line 17 true = True

Because I was getting 2 files with similar path I also added bookid to the name of the epub file:

Line 107 shutil.move(self.book_name + '.zip', self.book_name+'-'+ self.bookid + '.epub')

Line 107 was originally 106, but after inserting true at line 17 as above it changed.

Hope this helps,
Dan

Ror some books not able to create epub only zip is created

Ror some books not able to create epub only zip is created, like try book number 9780080568782

Is there any way to export books as HTML?[Feature Request]

Hello. Is there any way to export books as HTML? Because I want to convert a book to markdown the best way to achieve this downloading books as HTML.

Seems to error out on books divided into parts

e.g. Book 9780071742320

Wrong book id requested if it begins with 0

If I try to download book 0596007124. I get the following error:
2018-01-31 17:35:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=596007124>

The argparser is set up to get -b argument as int.

What´s the correct sintax for SSO downloads?

Hi,

I´m trying to use my SSO account to download some ebooks using this command line

safaribooks -o "C:\Users\Admin\Desktop\books" -u **myUsername** -p **myPassword** -b 9781617294167 -c 'BrowserCookie=AAAAA-BBBBB-CCCCC-DDDDD-EEEEEEEE; sessionid=AAABBBCCCDDDEEE' download-epub

But I´m receiving this error

2018-01-31 14:18:20 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-01-31 14:18:20 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.3.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g 2 Nov 2017), cryptography 2.1.4, Platform Windows-10-10.0.16299
Traceback (most recent call last):
File "C:\Python27\Scripts\safaribooks-script.py", line 11, in
load_entry_point('safaribooks==0.1.0', 'console_scripts', 'safaribooks')()
File "c:\python27\lib\site-packages\safaribooks_main_.py", line 121, in main
args.func(args)
File "c:\python27\lib\site-packages\safaribooks_main_.py", line 28, in download_epub
output_directory=args.output_directory
File "c:\python27\lib\site-packages\scrapy\crawler.py", line 170, in crawl
crawler = self.create_crawler(crawler_or_spidercls)
File "c:\python27\lib\site-packages\scrapy\crawler.py", line 198, in create_crawler
return self._create_crawler(crawler_or_spidercls)
File "c:\python27\lib\site-packages\scrapy\crawler.py", line 202, in _create_crawler
spidercls = self.spider_loader.load(spidercls)
File "c:\python27\lib\site-packages\scrapy\spiderloader.py", line 71, in load
raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: SafariBooks'

What could be the issue here? The program was installed correctly AFAIK.

Thanks.

CentOS 7 Spider not found: SafariBooks

2018-02-24 05:19:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-02-24 05:19:34 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.5 (default, Aug  4 2017, 00:39:18) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g  2 Nov 2017), cryptography 2.1.4, Platform Linux-3.10.0-693.11.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Traceback (most recent call last):
  File "/usr/bin/safaribooks", line 9, in <module>
    load_entry_point('safaribooks==0.1.0', 'console_scripts', 'safaribooks')()
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 121, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 28, in download_epub
    output_directory=args.output_directory
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 202, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/spiderloader.py", line 71, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: SafariBooks'

I installed Python 2.7.14 as an alternative (not deleting 2.7.5). https://tecadmin.net/install-python-2-7-on-centos-rhel/ Should I add something to configuration?

	parser.add_argument(
	'-b',
	'--book-id',
	type=int,
	help='Safari Books Online book ID',
	)