Giter Club home page Giter Club logo

cum's People

Contributors

counterpillow avatar hamuko avatar klaxa avatar kozec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cum's Issues

Error following a raw in Madokami

$ cum --version
cum version git-4405206 "Miyamo Chio-chan"

Hello. The title is pretty self-explanatory, so let me get right into the error:

$ cum follow 'https://manga.madokami.al/Raws/Yotsubato%21'
==> Invalid URL "https://manga.madokami.al/Raws/Yotsubato%21"

That URL, however, is valid. Is this behaviour intentional? By that, I mean, are you intentionally blocking raws? (From here it looks like that you're checking /Manga/* only, but nowhere in this repository I found something about raws, hence my question.)

PyPI distro tarball does not contain LICENSE and tests

You should probably try to comply with your own license if you want to distribute it, and also include the test suite so distributions can ensure basic functionality when packaging this software. Arch doesn’t give a shit, but other distros do.

Error following in bato.to

I have the latest version of cum

$ cum --version
cum version 0.8 "Miyamo Chio"

This happens when I follow a commit on bato.to, with or without the trailing slash in the URL

$ cum follow http://bato.to/comic/_/comics/grancrest-senki-r21482/
Traceback (most recent call last):
  File "/usr/local/bin/cum", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/cum/cum.py", line 209, in follow
    series = utility.series_by_url(url)
  File "/usr/local/lib/python3.6/site-packages/cum/utility.py", line 71, in series_by_url
    return Series(url)
  File "/usr/local/lib/python3.6/site-packages/cum/scrapers/batoto.py", line 18, in __init__
    self.chapters = self.get_chapters()
  File "/usr/local/lib/python3.6/site-packages/cum/scrapers/batoto.py", line 40, in get_chapters
    name = columns[0].img.next_sibling.strip()
AttributeError: 'NoneType' object has no attribute 'strip'

MangaDex support broken

Whenever I try either cum follow or cum get on a mangadex.org url I get the following throw right in my face:

~ » cum follow https://mangadex.org/title/30170
Traceback (most recent call last):
  File "/usr/bin/cum", line 11, in <module>
    load_entry_point('cum==0.9.1', 'console_scripts', 'cum')()
  File "/usr/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cum/cum.py", line 209, in follow
    series = utility.series_by_url(url)
  File "/usr/lib/python3.7/site-packages/cum/utility.py", line 71, in series_by_url
    return Series(url)
  File "/usr/lib/python3.7/site-packages/cum/scrapers/mangadex.py", line 24, in __init__
    self._get_page(self.url)
  File "/usr/lib/python3.7/site-packages/cum/scrapers/mangadex.py", line 49, in _get_page
    self.json = json.loads(r.text)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

..which I'm going to assume is not how it's supposed to be.

I am using cum version 0.9.1 on Arch Linux.

Madokami - option to keep original file names

Whenever I use cum get to download individual or all archives for a manga, the volume number gets stripped from the file name.

This will also help in cases like Shamo, which had two publishers.
The second publisher chose to restart the chapter numbers.
To keep the natural sort working, the filenames on Madokami use d instead of c before the chapter numbers like so: Shamo - d001-008 (v20) [m-s & wfp].zip

Problem is, cum downloads result in the following:

Shamo - c000 [d001-008 v20 [m-s ] [Unknown].zip
Shamo - c000 [d009-017 v21 [m-s ] [Unknown].zip
Shamo - c000 [d018-026 v22 [m-s ] [Unknown].zip
Shamo - c000 [d027-035 v23 [m-s ] [Unknown].zip
Shamo - c000 [d036-044 v24] [WFP].zip
Shamo - c000 [d045-053 v25] [WFP].zip
Shamo - c000 [d054-062 v26] [Illuminati-Manga].zip
Shamo - c000 [d063-071 v27] [Illuminati-Manga].zip
Shamo - c000 [d072-080 v28] [Illuminati-Manga].zip
Shamo - c000 [d081-089 v29] [Illuminati-Manga].zip
Shamo - c000 [d090-098 v30] [Illuminati-Manga].zip
Shamo - c000 [d099-107 v31] [Illuminati-Manga].zip
Shamo - c000 [d108-116 v32] [Illuminati-Manga].zip
Shamo - c000 [d117-125 v33] [wfp].zip
Shamo - c000 [d126-134 v34] [wfp].zip
Shamo - c001-010 [m-s].zip
Shamo - c011-020 [m-s].zip
Shamo - c021-030 [m-s].zip
Shamo - c031-040 [m-s].zip
Shamo - c041-050 [m-s].zip
Shamo - c051-060 [m-s].zip
Shamo - c061-070 [m-s].zip
Shamo - c071-080 [m-s].zip
Shamo - c081-090 [m-s].zip
Shamo - c091-100 [m-s].zip
Shamo - c101-110 [m-s].zip
Shamo - c111-120 [m-s].zip
Shamo - c121-131 [m-s].zip
Shamo - c132-142 [m-s].zip
Shamo - c143-153 [m-s].zip
Shamo - c154-164 [m-s].zip
Shamo - c165-177 [m-s].zip
Shamo - c178-188 [m-s].zip
Shamo - c189-200 [m-s].zip

I think adding an additional config value for Madokami madokami.keep_original_names might be a good idea.

Use platform specific directories

On Linux XDG basedir specification should be followed. That means the database should be stored in data_dir:

if 'XDG_DATA_HOME' in os.environ:
    data_dir = os.path.join(os.environ['XDG_DATA_HOME'], 'cum')
else:
    data_dir = os.path.join(os.environ['HOME'], '.local', 'share', 'cum')

And the configuration file in config_dir:

if 'XDG_CONFIG_HOME' in os.environ:
    config_dir = os.path.join(os.environ['XDG_CONFIG_HOME'], 'cum')
else:
    config_dir = os.path.join(os.environ['HOME'], '.config', 'cum')

No rollback when aborting adding a madokami url

If a madokami URL is passed to 'get' and cum asks for the login information, but cum is prematurely terminated using a keyboard interrupt, a seemingly incomplete series is added. It cannot be unfollowed through the command line as it doesn't show up, and thus needs to be manually deleted from the database. Otherwise, the following exception is thrown if a user tries to update:

Traceback (most recent call last):
  File "./cum.py", line 232, in <module>
    cli()
  File "/usr/lib/python3.4/site-packages/click/core.py", line 664, in __call__
    return self.main(*args, **kwargs)                                                        
  File "/usr/lib/python3.4/site-packages/click/core.py", line 644, in main                   
    rv = self.invoke(ctx)                                                                    
  File "/usr/lib/python3.4/site-packages/click/core.py", line 991, in invoke                 
    return _process_result(sub_ctx.command.invoke(sub_ctx))                                  
  File "/usr/lib/python3.4/site-packages/click/core.py", line 837, in invoke                 
    return ctx.invoke(self.callback, **ctx.params)                                           
  File "/usr/lib/python3.4/site-packages/click/core.py", line 464, in invoke                 
    return callback(*args, **kwargs)                                                         
  File "./cum.py", line 227, in update                                                       
    series = series_by_url(follow.url)                                                       
  File "/home/fratti/Projekte/cum/scrapers/__init__.py", line 19, in series_by_url           
    return MadokamiSeries(url)                                                               
  File "/home/fratti/Projekte/cum/scrapers/madokami.py", line 19, in __init__
    self.chapters = self.get_chapters()
  File "/home/fratti/Projekte/cum/scrapers/madokami.py", line 45, in get_chapters
    chapter = name_parts.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

fails to update or add new manga from mangadex

when i run cum update it starts updating all the manga but then it crashes with the following json error:

==> Updating 69 series
Traceback (most recent call last):
  File "/usr/bin/cum", line 10, in <module>
    sys.exit(cli())
  File "/usr/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/cum/cum.py", line 432, in update
    series = future.result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.7/site-packages/cum/utility.py", line 71, in series_by_url
    return Series(url)
  File "/usr/lib/python3.7/site-packages/cum/scrapers/mangadex.py", line 24, in __init__
    self._get_page(self.url)
  File "/usr/lib/python3.7/site-packages/cum/scrapers/mangadex.py", line 49, in _get_page
    self.json = json.loads(r.text)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

it produces a similar error when I try adding a new series from mangadex

running cum version 0.9.1 "Morino Kirin-chan" from pip

Unfollowing a series doesnt reset downloaded column

I was attempting to force a redownload of a entire manga since it was downloaded with a previous version which had a filename bug in it from a prior commit.

When doing

cum unfollow trash
#delete the physical folder
cum follow https://bato.to/comic/_/comics/trash-r13308
cum update
cum download trash

the result was that 0 chapters have been downloaded. Specifying the --download flag for the follow command did not change this result.

My solution: Manually edit the cum.db database to set your series downloaded flag back to zero and then re-run the download command.

If I recall my query I ran was something like:

UPDATE chapters SET downloaded = 0 where series_id = (select id from series where alias = "trash");

Possible solutions may be to set the downloaded flag to 0 after an unfollow has occured (or remove it completely from the database) and/or add a --force-download flag to some of the commands.

NotADirectoryError under Windows

I know you can manually change the directory that cum is downloading the chapters into, but thought I would still report this.
When there are manga that have special characters in their name (e.g "Boruto: Naruto Next Generations"), a NotADirectoryError is thrown as Windows does not allow folder names to have special characters.

[Madokami] Connection reset by peers?

First, thanks for your software. It helps me a lot.
Now for the problem, I try to do this
cum follow https://manga.madokami.al/Manga/M/MA/MAJI/Majipikoru
but I got this traceback

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 835, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 377, in wrap_socket
    _context=self)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 752, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 988, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 633, in do_handshake
    self._sslobj.do_handshake()
ConnectionResetError: [Errno 54] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 423, in send
    timeout=timeout
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 643, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 334, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 594, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 350, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 835, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/connection.py", line 323, in connect
    ssl_context=context)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/packages/urllib3/util/ssl_.py", line 324, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 377, in wrap_socket
    _context=self)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 752, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 988, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/ssl.py", line 633, in do_handshake
    self._sslobj.do_handshake()
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/cum", line 11, in <module>
    sys.exit(cli())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cum/cum.py", line 204, in follow
    series = utility.series_by_url(url)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cum/utility.py", line 71, in series_by_url
    return Series(url)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cum/scrapers/madokami.py", line 21, in __init__
    r = self.session.get(url)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 501, in get
    return self.request('GET', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/requests/adapters.py", line 473, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

At first I thought I was fucked up or something, but when I tried to download in batoto, it didn't have this problem.
Can I have some enlightenment here?
Thanks ~

Subdirectories at Madokami

I noticed some series have subdirectories for certain releases. Attempting to follow a series based on the subdirectory URL doesn't work, and it is normally skipped when cum scrapes those series.

I guess this is a feature request. I was looking at the code myself, and was wondering what would be better, allowing users to follow based on the subdirectory URL or giving them a parameter to specify the URL used to grab chapters.

Madokami: Missing from remote

After using cum for a couple months with Madokami, I now get the same error any time I try and download issues.

==> Removing **MANGA NAME** **CHAPTER NUMBER**: missing from remote

Batoto login error

I was logged into Bato.to via the web browser, and I set my username/password in cum and all worked well.

I assume my session expired (it has been a week or so since I last visited boto.to or used cum) as I was logged out on my browser and cum was receiving a "Batoto login error" message

cum update
==> Updating 3 series
  [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]  3/3
==> Unable to update dead-tube (Batoto login error)
==> Unable to update suicide-island (Batoto login error)
==> Unable to update trash (Batoto login error)

I resolved this by setting the password hash to an empty string and re-logging into bato.to via the web browser (I don't know if this would effect it at all cause I would imagine you have your own web kit for handling cookies and sessions):

cum config set batoto.pass_hash ""

and running the update again successfully grabbed the latest chapters.

Perhaps cum is missing a check somewhere to re-login or generate new cookies/session data once a session has expired.

Fails to follow new series if it has the same alias of a previously unfollow series.

First of all; Thank you for your software, it is life saving. I use it everyday!

Try the following:

  • cum follow https://bato.to/comic/_/comics/happiness-oshimi-shuzo-r14710
  • cum unfollow happiness-oshimi-shuzo
  • cum follow https://manga.madokami.com/Manga/H/HA/HAPP/Happiness%20%28OSHIMI%20Shuzo%29

https://manga.madokami.com/Manga/H/HA/HAPP/Happiness%20%28OSHIMI%20Shuzo%29 would create a new series with the alias "happiness-oshimi-shuzo", however, since that alias has been previously used, it returns an error. There seems to be no way to choose an alias while following, the only possible solution would be manually editing cum.db.

Changing between batoto, madokami and dynastyscans can useful for us, sometimes.

And without further ado, here's the traceback:

==> Adding follow for Happiness (OSHIMI Shuzo) (happiness-oshimi-shuzo)
Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/cum/scrapers/base.py", line 44, in follow
    s = db.session.query(db.Series).filter_by(url=self.url).one()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/query.py", line 2699, in one
    raise orm_exc.NoResultFound("No row was found for one()")
sqlalchemy.orm.exc.NoResultFound: No row was found for one()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlite3.IntegrityError: UNIQUE constraint failed: series.alias

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/cum", line 9, in <module>
    load_entry_point('cum==0.5.1', 'console_scripts', 'cum')()
  File "/usr/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/cum/cum.py", line 199, in follow
    series.follow()
  File "/usr/lib/python3.5/site-packages/cum/scrapers/base.py", line 48, in follow
    db.session.commit()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 801, in commit
    self.transaction.commit()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 392, in commit
    self._prepare_impl()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 372, in _prepare_impl
    self.session.flush()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2019, in flush
    self._flush(objects)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2137, in _flush
    transaction.rollback(_capture_exception=True)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/util/langhelpers.py", line 60, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/util/compat.py", line 184, in reraise
    raise value
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/session.py", line 2101, in _flush
    flush_context.execute()
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 373, in execute
    rec.execute(self)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/unitofwork.py", line 532, in execute
    uow
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 174, in save_obj
    mapper, table, insert)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/orm/persistence.py", line 800, in _emit_insert_statements
    execute(statement, params)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 914, in execute
    return meth(self, multiparams, params)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1146, in _execute_context
    context)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
    exc_info
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/util/compat.py", line 200, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/util/compat.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1139, in _execute_context
    context)
  File "/usr/lib/python3.5/site-packages/SQLAlchemy-1.0.12-py3.5-linux-x86_64.egg/sqlalchemy/engine/default.py", line 450, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: series.alias [SQL: 'INSERT INTO series (name, alias, url, following, directory) VALUES (?, ?, ?, ?, ?)'] [parameters: ('Happiness (OSHIMI Shuzo)', 'happiness-oshimi-shuzo', 'https://manga.madokami.com/Manga/H/HA/HAPP/Happiness%20%28OSHIMI%20Shuzo%29', 1, None)]

0.9.1: Installs unnecessary files

setup.py in the release tarball for 0.9.1 installes the following files in error:

/$prefix/LICENSE
/$prefix/tests/broken_config.json
/$prefix/tests/broken_database.db

'follow' crash

I wanted to follow a series and got a crash:

cum follow http://bato.to/comic/_/comics/rozen-maiden-ii-r2656
Batoto username: JonnyRobbie
Batoto password: 
Traceback (most recent call last):
  File "/usr/bin/cum", line 11, in <module>
    load_entry_point('cum==0.8', 'console_scripts', 'cum')()
  File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/cum/cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/cum/cum.py", line 209, in follow
    series = utility.series_by_url(url)
  File "/usr/lib/python3.6/site-packages/cum/utility.py", line 71, in series_by_url
    return Series(url)
  File "/usr/lib/python3.6/site-packages/cum/scrapers/batoto.py", line 18, in __init__
    self.chapters = self.get_chapters()
  File "/usr/lib/python3.6/site-packages/cum/scrapers/batoto.py", line 40, in get_chapters
    name = columns[0].img.next_sibling.strip()
AttributeError: 'NoneType' object has no attribute 'strip'

$ cum --version
cum version 0.8 "Miyamo Chio"

Permission denied while creating zip + weird directory creation

The creation of the zip archive fails due to a permission denied error on one of the downloaded temporary files.

C:\Users\user\Desktop\manga>cum download suicide-island
==> Downloading 82 chapters
suicide-island 1
  [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]  36/36  100%
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python35-32\Scripts\cum-script.py", line 9, in <module>
    load_entry_point('cum==0.5.1.post4', 'console_scripts', 'cum')()
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\click\core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\click\core.py", line 696, in main
    rv = self.invoke(ctx)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\click\core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\click\core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\click\core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\cum\cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\cum\cum.py", line 164, in download
    chapter.get()
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\cum\scrapers\base.py", line 239, in get
    self.download()
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\cum\scrapers\batoto.py", line 154, in download
    self.create_zip(files)
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\site-packages\cum\scrapers\base.py", line 153, in create_zip
    z.write(f.name, 'img{num:0>6}{ext}'.format(num=num, ext=ext))
  File "c:\users\user\appdata\local\programs\python\python35-32\lib\zipfile.py", line 1479, in write
    with open(filename, "rb") as fp:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\user\\AppData\\Local\\Temp\\tmp6v7_kcsm.jpe'

Due note that the temp file name is not a typo, I've seen three seperate file extensions in multiple runs of this trace back. "jpg", "jpeg", "jpe"

Secondly Inside the download_directory cum creates another directory using the download_directory sans slashes as the file name:

C:\Users\user\Desktop\manga>cum config get download_directory
download_directory = C:\Users\user\Desktop\manga

http://i.imgur.com/KQtCYsC.png

This was encountered on the latest microsoft-sucks branch.

Image converter support

This is a feature request to add image conversion support built into Cum. I suppose this could be done via ImageMagick externally, or if python has any built in image conversion utilities/packages.

I find I am forever converting some series from more obscure formats such as webp to png and/or jpg and was just wondering if there could ever be support to have this built in via config settings and external calling of the Imagemagick morgrify application.

Thanks.

Feature Request: Search for a manga via APIs/GET

As far as I can tell, to follow a manga you would currently have to go to madokami/mangadex/etc and manually grab the series URL in order to do cum follow $url. It would be nice to have the ability to search the download sites (where applicable) in order to find series to follow. This might be a niche usecase as it really isn't that hard to open a browser and search, but if this is a command line tool modeled after package managers, then I think a search ability would be within the scope considering many package managers have the ability to search for packages.

As an idea of what I'm thinking of:

$ cum search "Dragonball" 
==> Searching for "Dragonball"
Found: 
(1) /Manga/D/DR/DRAG/Dragon Ball
(2) /Raws/Dragonball
(3) /Manga/D/DR/DRAG/Dragon Ball Super
(4) /Manga/_Doujinshi/Dragonball
(5) /Manga/_Doujinshi/Dragonball/Dragon Ball Z - Legendary Vegeta (Doujinshi)

Maybe for example you cut it off at 5 or so results, and the user can refine it to be more specific if they can't find what they're looking for (or ultimately just search directly/google it). You could even potentially have them select which ones to follow based on the the number.

I'm sure this would be a large amount of work, but perhaps a way to implement it would be a new scraper class, BaseSearch, that would handle both requesting and parsing search parameters, either through an API (if it exists) or just a simple GET with a generated URI for search. And then you'd have to update all the scrapers with search parsing. Not sure if that's the best way to do it, but from looking at the code a bit I think it would be feasible to do it this way.

mangadex doesn't seem to work

~ λ cum follow https://mangadex.org/title/19435/back-street-girls
==> Invalid URL "https://mangadex.org/title/19435/back-street-girls"

~ Λ cum --version
cum version 0.9 "Morino Kirin"

(installed from AUR a few days ago)

My system is Arch Linux, my Python version is 3.7.0.

Add support for mangadex.com

Bato.to is shutting down and doki's new website mangadex.com seems to be positioning itself as the main replacement.
RSS seems broken right now so I would wait with implementing the follow function,
but scraping archives can probably already be implemented.

A couple of suggestions

First of all, great job with this, it makes my life easier, but I do have a couple of suggestions

  1. I think "get" should work with both individual chapters and complete mangas, similar to "follow --download".
    screenshot from 2015-11-06 13-50-46
  2. Allow the use of the flag --folder when using any command that involves downloads, but specially "follow", something like pic related. Something similar with "alias" would be great too!
    screenshot from 2015-11-06 13-52-55

Thank you for your attention and thank you for your work!

MangaFox scraping support

Hi, I am wondering if support for Mangafox scraping will be added to Cum. There are a few mangas that Batoto removed the English versions of which Mangafox has.

Thanks.

Madokami site revamp

Madokami recently switched servers and the site layout has changed. The series overview is no longer accessible, manga.madokami.com redirects to the main page. The ftp folders are visible, but they do not seem to work when using cum follow.

Mangadex: Downloading empty chapters

I recently had an issue with the mangadex scraper #69
thankfully it was solved but now the issue is that while I can update without any issue and add manga, when I download a manga it goes through the process but when I check the file it is an empty file, I believe that this might have something to do with Cloud Flare protection as did my previous issue

os.mkdir() will throw an exception on NTFS file systems if manga name breaks Windows naming convention

Example: It's Not My Fault That My Friend's Not Popular.
url: https://bato.to/comic/_/comics/its-not-my-fault-that-my-friends-not-popular-r8377

I'm running Arch Linux and Python 3.5.1. My download directory's file system is NTFS. I can manually create directories that end with a dot, but it won't work in Python. Python doesn't allow this as it conflicts with the Windows naming conventions, even if it's technically possible in NTFS:

Do not end a file or directory name with a space or a period. Although the underlying file system may support such names, the Windows shell and user interface does not.

The fact that I do not get this error if my download directory's file system is ext4 supports this conclusion.

I realize this is somewhat of an edge-case, but it would be neat if NTFS file systems where detected and the directory and file names where pruned to be acceptable according to Python. Specifically, this would entail stripping the directory and filename from these characters: <>:"/\|?* and . if it is the last character in a directory name (all filenames should end with .zip/.cbz by default, so they won't be an issue).

I couldn't find any details on how one could detect if a file system was NTFS from Python, so an easier solution could be a new option: windows_naming_convention which defaults to false.

Terminal output:

$ cum get https://bato.to/comic/_/comics/its-not-my-fault-that-my-friends-not-popular-r8377
its-not-my-fault-that-my-friends-not-popular 28
  [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]  5/5  100%
Traceback (most recent call last):
  File "/usr/bin/cum", line 9, in <module>
    load_entry_point('cum==0.4', 'console_scripts', 'cum')()
  File "/usr/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/cum/cum.py", line 16, in wrapper
    return f(*args, **kwargs)
  File "/usr/lib/python3.5/site-packages/cum/cum.py", line 189, in get
    chapter.get(use_db=False)
  File "/usr/lib/python3.5/site-packages/cum/scrapers/base.py", line 208, in get
    self.download()
  File "/usr/lib/python3.5/site-packages/cum/scrapers/batoto.py", line 168, in download
    self.create_zip(files)
  File "/usr/lib/python3.5/site-packages/cum/scrapers/base.py", line 112, in create_zip
    with zipfile.ZipFile(self.filename, 'w') as z:
  File "/usr/lib/python3.5/site-packages/cum/scrapers/base.py", line 191, in filename
    os.makedirs(directory)
  File "/usr/lib/python3.5/os.py", line 241, in makedirs
    mkdir(name, mode)
OSError: [Errno 22] Invalid argument: "/dl/It's Not My Fault That My Friend's Not Popular."

ImportError at launch

I managed to install cum, but launching it with any command results in

Traceback (most recent call last):
File "/usr/local/bin/cum", line 9, in
load_entry_point('cum==0.2.1', 'console_scripts', 'cum')()
File "/usr/local/lib/python2.7/dist-packages/click-5.1-py2.7.egg/click/core.py", line 700, in call
return self.main(_args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/click-5.1-py2.7.egg/click/core.py", line 680, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/dist-packages/click-5.1-py2.7.egg/click/core.py", line 1024, in invoke
Command.invoke(self, ctx)
File "/usr/local/lib/python2.7/dist-packages/click-5.1-py2.7.egg/click/core.py", line 873, in invoke
return ctx.invoke(self.callback, *_ctx.params)
File "/usr/local/lib/python2.7/dist-packages/click-5.1-py2.7.egg/click/core.py", line 508, in invoke
return callback(_args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cum-0.2.1-py2.7.egg/cum/cum.py", line 8, in cli
from cum import db, output
ImportError: cannot import name db

Ability to download only certain chapters from a followed series

So there is this issue I have. After following a series and then listing chapters, I see lots of them as 'n'ew (if I understand that correctly). But as far as I know, download downloads all chapters from the series. Can I specify only certain chapters to be downloaded? There are multiple releases per chapter as well as a lot of chapters I've already read. I know I could probably use get with URL, but with alias, there can be an ambiguity of chapters - for example rozen-maiden-ii has three english releases for c060.

Also it seems that get doesn't seem to set the 'downloadedflag inchapters`.

Error when downloading Boku no Hero Aacademia 191 from Mangadex

1 through 190 downloads just fine, but when it gets to 191, it hits this error.

Traceback (most recent call last):
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Comics\_Updater\python-3.6.6.amd64\Scripts\cum.exe\__main__.py", line 9, in <module>
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\cum\cum.py", line 15, in wrapper
    return f(*args, **kwargs)
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\cum\cum.py", line 154, in download
    chapter.get()
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\cum\scrapers\base.py", line 250, in get
    self.download()
  File "D:\Comics\_Updater\python-3.6.6.amd64\lib\site-packages\cum\scrapers\mangadex.py", line 134, in download
    chapter_hash = re.search(self.hash_re, r.text).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Underscore in download directory causes download to fail

I am running this off of a VM and set the download path to the shared folder I use ("/media/sf_VM_Shared"). When I try to download something it returns a Permission Denied error because it tries to write to "/media/sfVMShared", which doesn't exist.

Not sure if this is a VM error or something else. I've temporarily rectified this by symlinking a folder in my home directory to the directory I wanted to use.

Malformed names from source site causes overwritten archives

Specific Mangas such as https://bato.to/comic/_/comics/homunculus-r1543 have malformed names in which all chapters are released under the name "ch.0". This was because this series had no chapter breaks so scanlators released them per volume rather than per chapter.

This causes Cum to overwrite the same archive each download since it is parsing the metadata for only chapters and not volumes and chapters.

For instance the filename for ALL Illuminati-Manga releases is Homunculus - c000 [Illuminati-Manga].zip and causes each volume downloaded under this groups name to overwrite itself.

Madokami source works a bit better, as in it downloads all the volumes since they have the volumes postfixed to the chapter number, but still requires manual renaming due to bad naming conventions.

Bato.to source can be fixed by appending the volume specified to the file name, not sure how to fix madokami easily.

Malformed file exceptions for jpeg

I've noticed this on a few series and I simply batch renamed them myself without thinking about it, but I've noticed that the "jpeg" extension gets saved to the zip archive as "jpe" as shown here:

http://i.imgur.com/k49OJSg.png

The effected manga chapter is "DEAD Tube - c027 [Yo Manga]" I believe from the batoto source.

Just a minor thing, the images are not corrupted or anything, just being saved with the wrong extension.

Edit:: I guess JPE is a new extension that my reader doesn't recognize, but seems weird that you wouldn't use the source file extension of "jpg". Either way just letting you know.

Feature Request : Polling Waiter

I've been meaning to reply to issue #73 for a while, but I thought I'd just go ahead and open a feature-request.

It doesn't look like cum is using a waiter, or I haven't crawled over the scraper functions enough to see one. If a waiter was used in between page scrapes, then I'm sure many connectivity concerns would be put at bay (it's never perfect if it ends up being a cat and mouse game). A waiter is typically used to keep from constantly pegging your external resources, otherwise they end up DoS'ed. The idea is to wait random intervals between checks, to give yourself a better opportunity to "get in" if multiple reads are occurring at once (so you're not predictable) and also to appear a bit more actual-person-like.

The environment I'm most familiar with is AWS (but other cloud providers do it too), where they require one if you're using their APIs to scrape json, etc. With them, if you're "bursty" too often you'll be rate-limited, or if you're consistently polling you'll be IP-banned. It's just something you expect to implement when polling over the web... Cloudflare's no different here.

In Mangadex's case, I've had pretty good success by just adding a sleep call at random intervals to mangadex.py. There's a python library with fancy features, but a waiter is just literally waiting randomly...

   
    def download(self):
        if getattr(self, 'r', None):
            r = self.r
        else:
            r = self.reader_get(1)

        chapter_hash = self.json['hash']
        pages = self.json['page_array']
        files = [None] * len(pages)
        # This can be a mirror server or data path. Example:
        # var server = 'https://s2.mangadex.org/'
        # var server = '/data/'
        mirror = self.json['server']
        server = urljoin('https://mangadex.org', mirror)
        futures = []
        last_image = None
        with self.progress_bar(pages) as bar:
            for i, page in enumerate(pages):
                if guess_type(page)[0]:
                    image = server + chapter_hash + '/' + page
                else:
                    print('Unkown image type for url {}'.format(page))
                    raise ValueError
                ### simple waiter
                if i % 2 == 0:
                    sleep(randrange(1, 6))
                ### 
                r = requests.get(image, stream=True)
                if r.status_code == 404:
                    r.close()
                    raise ValueError
                fut = download_pool.submit(self.page_download_task, i, r)
                fut.add_done_callback(partial(self.page_download_finish,
                                              bar, files))
                futures.append(fut)
                last_image = image
            concurrent.futures.wait(futures)
            self.create_zip(files)
   

It was a quick five-minute test, but it's held up for a few months now on my server. Every other chapter it waits 1-6 seconds before attempting to scrape the next. Prior, I'd have issue with multiple chapter batches or groups of more than 15-ish pages. So far, I've been okay with scrapes involving 40-80 chapters, around 20-30 pages each. Anything above those values I can still have trouble with, but the cron-job usually catches it on the next run. I haven't messed with it much, but increasing the second range values has had more reliability... it just takes a longer to complete of course.

Waiting between page or chapter scrapes would be a nice addition, but it's not a perfect solution. I've been briefly looking into picking up at the exact page when Cloudflare cuts me off and an exception is raised. If I could track pages between errors and then wait a few seconds, maybe over a minute, and pick up where it left off... then the scraper would be fairly reliable I think.

With this all running on a regular basis in the background, I've got some other quick-hack-jobs going. I split up my madokami and mangadex downloads into separate processes, since madokami requests have stayed active more often than mangadex (and I want to limit bottlenecks). The other is just simply waiting randomly between downloads (each series at a time). For series without any updates, 2-7 seconds... for series with actual scrapes going on, 40-80 seconds. With all of that, I've been pretty happy with how much and how often I can pull from mangadex now.

So, just some ideas for anyone to look into or try.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.