voyageur / dagr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cutie/dagr

72.0 72.0 17.0 151 KB

a deviantArt image downloader script written in Python

Python 100.00%

dagr's People

Contributors

Stargazers

Watchers

Forkers

jar4ek lootek arvindsa llcorvinsll orangepole aspoonybard3 tatokis holycrepe zanyterp fak3 techierishi racer46 genminusthree phillmac gardon wambosa rarogcmex

dagr's Issues

Custom output directory

Allow to specify an output directory (instead of current dir)

Merge deviant_get() and group_get()

Both crawling functions always looked alike (even before I took over development). With recent fixes to repair group crawling, they are even closer than before.

Time to finally merge these in one single function (less code to update when site structure changes)

Crash: AttributeError: 'NoneType' object has no attribute 'group'

I ran the file like that:
python3 dagr.py -gs name and everytime, no matter what name, it results in this:

dagr.py v0.71.3 - deviantArt gallery ripper
Traceback (most recent call last):
  File "dagr.py", line 608, in <module>
    main()
  File "dagr.py", line 562, in main
    deviant = re.search(r'<title>.[A-Za-z0-9-]*', html,
AttributeError: 'NoneType' object has no attribute 'group'

Crash when guess_extension() returns None

some gallery would crash with the following error

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python37-32\Scripts\dagr.py", line 572, in <module>
    main()
  File "C:\Program Files (x86)\Python37-32\Scripts\dagr.py", line 555, in main
    ripper.deviant_get("gallery")
  File "C:\Program Files (x86)\Python37-32\Scripts\dagr.py", line 342, in deviant_get
    self.get_images(mode, mode_arg, pages)
  File "C:\Program Files (x86)\Python37-32\Scripts\dagr.py", line 303, in get_images
    self.get(filelink, base_dir + "/" + filename)
  File "C:\Program Files (x86)\Python37-32\Scripts\dagr.py", line 160, in get
    get_resp.headers.get("content-type").split(";")[0]))
TypeError: can only concatenate str (not "NoneType") to str

i trace the problem down to guess_extension() returns None. the error above is when downloading file with content-type of application/rar in my windows machine

i think this issue also causes #36

Crash on gallery retrieval

Downloading 163 of 549 ( https://sandara.deviantart.com/art/desolate-199659377 )
Traceback (most recent call last):
File "dagr.py", line 627, in
main()
File "dagr.py", line 610, in main
ripper.deviant_get("gallery")
File "dagr.py", line 315, in deviant_get
mode + "/" + filename)
File "dagr.py", line 155, in get
str(self.browser.response.status_code))
main.DagrException: incorrect status code: 404

I´d like to test it, but I dont know how, please help me

I have already installed Python, but now, what do I do? I mean. Which code I need to use?

Crash: AttributeError: 'NoneType' object has no attribute 'text'

It's similar to #21 I guess but it does not download

~$ ./dagr.py -mgrsv -d "/media/Data1/art/Shabazik" shabazik
Shabazik's gallery page 1 crawled
Shabazik's gallery page 2 crawled
...
Shabazik's gallery page 122 crawled...
Shabazik's gallery page 123 crawled...
Could not find Shabazik's gallery
Total deviations in Shabazik's gallery found: 25
Total deviations to download: 25
Downloading 1 of 25 ( https://www.deviantart.com/shabazik/art/Goblins-of-Aiers-837348094 )
Download link not found, falling back to direct image
Traceback (most recent call last):
  File "./dagr.py", line 610, in <module>
    main()
  File "./dagr.py", line 591, in main
    ripper.deviant_get("gallery")
  File "./dagr.py", line 372, in deviant_get
    self.get_images(mode, mode_arg, pages)
  File "./dagr.py", line 322, in get_images
    filename, filelink = self.find_link(link)
  File "./dagr.py", line 234, in find_link
    "span", {"itemprop": "title"}).text == "Literature":
AttributeError: 'NoneType' object has no attribute 'text'
~ $

Also 25 downloads is obviously not correct.

Search albums and collections by name

Currently to download an album (or a collection), one has to specify the full end of the URL instead of the album name.
For example "8237748/Photo-Art" instead of "Photo Art".

dagr should parse the main gallery/favourites page to find the album URL from its name (duplicate names are not allowed on deviantart, so no problem in dropping the ID)

Had no deviations

Did something change? Or am I doing something incorrectly?

$ python --version
Python 2.7.14

$ python dagr.py -mgs ellysiumn
dagr.py v0.63 - deviantArt gallery ripper
Current deviant: Ellysiumn
Ripping Ellysiumn's gallery...
Ellysiumn's gallery page 1 crawled...
Total deviations in Ellysiumn's gallery found: 2
Ellysiumn's gallery successfully ripped.
Ripping Ellysiumn's scraps...
Ellysiumn's scraps page 1 crawled...
Total deviations in Ellysiumn's scraps found: 2
Ellysiumn's scraps successfully ripped.
Job complete.

$ python dagr.py -mgs jack-13
dagr.py v0.63 - deviantArt gallery ripper
Current deviant: Jack-13
Ripping Jack-13's gallery...
Jack-13's gallery had no deviations.
Ripping Jack-13's scraps...
Jack-13's scraps had no deviations.
Job complete.

Sometimes not downloading full sizes

Hi, @voyageur. I have noticed that sometimes dagr does not download full-sized pictures, but the smaller preview ones. I suspect this is the case for full-sized pictures without Download button (the ones you have to make three mouseclicks to see the original image). As soon as I can give you an example I will post it here, but I wanted to report this behaviour for future reference.

Start download from a specified number

I can't find any way to start downloads from an offset or a specified file number. Is there a way to place such a capability in the code?

Backend change breaking everything

python dagr.py -g yacrical

dagr.py v0.71.3 - deviantArt gallery ripper
Traceback (most recent call last):
  File "dagr.py", line 608, in <module>
    main()
  File "dagr.py", line 563, in main
    re.IGNORECASE).group(0)[7:]
AttributeError: 'NoneType' object has no attribute 'group'

FileNotFoundError every time, perhaps an improper install

Hi there,

First of all, I'm unsure if I installed this correctly, as I'm newish to python. Tried first through pip, but even after a successful install, it was not in my path, so python dagr.py never worked, with the module/package not being found. Tried again from source, which once again seemed to complete successfully with a build and dist folder being generated.

After crawling any gallery, I consistently get a FileNotFoundError. Any info would be super helpful!!

~~~~ python -V
Python 2.7.14
~~~~ python dagr/dagr.py -mgs lucid-light
dagr.py v0.70 - deviantArt gallery ripper
Current deviant: lucid-light
Ripping lucid-light's gallery...
lucid-light's gallery page 1 crawled...
lucid-light's gallery page 2 crawled...
lucid-light's gallery page 3 crawled...
lucid-light's gallery page 4 crawled...
lucid-light's gallery page 5 crawled...
lucid-light's gallery page 6 crawled...
lucid-light's gallery page 7 crawled...
lucid-light's gallery page 8 crawled...
lucid-light's gallery page 9 crawled...
lucid-light's gallery page 10 crawled...
lucid-light's gallery page 11 crawled...
Total deviations in lucid-light's gallery found: 243
Traceback (most recent call last):
  File "dagr/dagr.py", line 548, in <module>
    main()
  File "dagr/dagr.py", line 531, in main
    ripper.deviant_get("gallery")
  File "dagr/dagr.py", line 318, in deviant_get
    self.get_images(mode, mode_arg, pages)
  File "dagr/dagr.py", line 256, in get_images
    except FileNotFoundError as fnf_error:
NameError: global name 'FileNotFoundError' is not defined

Convert from mechanize to robobrowser

Current version works fine, but mechanize is python2-only and abandoned. After some searching, it looks like robobrowser is promising: based on beautifulsoup4 and requests, all versions of python supported (including python3), ...

While converting to this new library, the code should be updated to be python3 compliant too

IOError: [Errno 36] File name too long

I'm receiving this error while trying to use it with no username/password:

`
dagr.py v0.70.1 - deviantArt gallery ripper

Current deviant: Satomi-Tadashi

Ripping Satomi-Tadashi's gallery...

Satomi-Tadashi's gallery page 1 crawled...

Satomi-Tadashi's gallery page 2 crawled...

Satomi-Tadashi's gallery page 3 crawled...

Satomi-Tadashi's gallery page 4 crawled...

Total deviations in Satomi-Tadashi's gallery found: 87

Total deviations to download: 87

Downloading 1 of 87 ( https://www.deviantart.com/satomi-tadashi/art/Junko-Kurosu-100341362 )

Traceback (most recent call last):

File "dagr.py", line 549, in

main()

File "dagr.py", line 532, in main

ripper.deviant_get("gallery")

File "dagr.py", line 319, in deviant_get

self.get_images(mode, mode_arg, pages)

File "dagr.py", line 280, in get_images

self.get(filelink, base_dir + "/" + filename)

File "dagr.py", line 138, in get

local_file = open(file_name, "wb")

IOError: [Errno 36] File name too long: u'/home/eldark/downloads/dagr/Satomi-Tadashi/gallery/file?downloadToken=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ1cm46YXBwOjdlMGQxODg5ODIyNjQzNzNhNWYwZDQxNWVhMGQyNmUwIiwiaXNzIjoidXJuOmFwcDo3ZTBkMTg4OTgyMjY0MzczYTVmMGQ0MTVlYTBkMjZlMCIsImV4cCI6MTU0MTQyMTcyMSwiaWF0IjoxNTQxNDIxMTExLCJqdGkiOiI1YmUwMzg0MWM4OTkyIiwib2JqIjpudWxsLCJhdWQiOlsidXJuOnNlcnZpY2U6ZmlsZS5kb3dubG9hZCJdLCJwYXlsb2FkIjp7InBhdGgiOiJcL2ZcLzEzOWJjMGQ3LWE0ZmMtNDVkYy04NTQ1LTc0Yjg3MjM2M2VlYVwvZDFucW53Mi05MmY0ZTljOC05ZDA4LTQ1YzMtYTRjOC04NDI3ZmI5MTBiNWMuanBnIn19.wQ-HqxpbDjFLfMsqIsIfNSfKSTgJEoxJU7aTf8mcJeg'
`

EDIT: CentOS Linux release 7.5.1804

FileNotFoundErrpr: Errno 2

I have been recieving this error in V3.6 and 2.7. Wondering if there is a simple work around. Fairly new to python. Thank you for any assistance.

Enhancements: existing files

Hi, great script that says what it does. Two points:

Script is slow over a high latency connection especially when dealing with already downloaded files (using this as a sync tool). Can existing files in directory be listed and compared w gallery entries to speed things up when the overwrite flag is off?

Support for threading options? Obviously some sort of sane cap is advised. (well, xargs technically takes care of this for multiple artists, so lower priority item)

Instructions aren't clear (noob)

I compile dagr.py and get this:

dagr.py v0.63 - deviantArt gallery ripper
Usage: dagr.py [-d directory] [-u username] [-p password] [-acfghoqrstv] [deviant]...
Example: dagr.py -u user -p 1234 -gsfv derp123 blah55
For extended help and other options, run dagr.py -h

Even with the example im unable to figure out what im supposed to write. Do i write "dagr.py [-d C://folder] [-u my username] [-p mypassword] [-no idea what this is] [gallery link]" ? I can't figure this out, thanks.

NoneType found (usually .html type file)

When using a script, sometimes there is such an error,
the file without the extension is saved, the script interrupts the operation and the following information appears:

"Traceback (last connection last call):
   File "/usr/local/bin/dagr.py", line 572, in
     main ()
   File "/usr/local/bin/dagr.py", line 555, in main
     ripper.deviant_get ("gallery")
   File "/usr/local/bin/dagr.py", line 342, in deviant_get
     self.get_images (mode, mode_arg, pages)
   File "/usr/local/bin/dagr.py", line 303, in get_images
     self.get (filelink, base_dir + "/" + filename)
   File "/usr/local/bin/dagr.py", line 160, in get
     get_resp.headers.get ("content-type")))
TypeError: coercing to Unicode: need string or buffer, NoneType found "

Usually, you just have to give the .html extension to the last file downloaded, and it still works until the next occurrence of such an error.
Would not it be possible to add (correct) that if it finds a file without an extension, it would add a .html extension to it by default?

Issues with unicode in html on Windows

Hello,

I'm getting the following errors when logged in:

Downloading 52 of 124 ( http://silentyeller.deviantart.com/art/James-P-Sullivan-387771623 )
Download error. Possible mature deviation? ( http://silentyeller.deviantart.com/art/James-P-Sullivan-387771623 )
Downloading 62 of 124 ( http://silentyeller.deviantart.com/art/Dtherneth-390737625 )
Download error. Possible mature deviation? ( http://silentyeller.deviantart.com/art/Dtherneth-390737625 )

I think it might have something to do with unicode characters somewhere in the web page, as they both have a smiley somewhere in the description or comments. I do recall previous versions of dagr crashing with some error mentioning unicode, but this version just says (Download error. Possible mature deviation?).

Thank you for your time.

Sometimes not downloading full-size pictures

Sometimes it doesn't download full-size pictures, only some thumbnail size or similar.
Without saying anything about it in command line.
Happens to most of the files I've tried to download.

Fail to download any images on Python 3.7

I'm using dagr.py v0.64 with robobrowser 0.5.3 on Python 3.7.0- amd64 on Windows 10 Home 1803. The script starts up fine, but comes up with the following errors and fails to download any images. Username and password redacted.

$ dagr.py -u username -p password -mgsv Quentinvcastel
dagr.py v0.64 - deviantArt gallery ripper
Current deviant: Quentinvcastel
Ripping Quentinvcastel's gallery...
Quentinvcastel's gallery page 1 crawled...
Quentinvcastel's gallery page 2 crawled...
Quentinvcastel's gallery page 3 crawled...
Quentinvcastel's gallery page 4 crawled...
Quentinvcastel's gallery page 5 crawled...
Total deviations in Quentinvcastel's gallery found: 114
Downloading 1 of 114 ( https://www.deviantart.com/quentinvcastel/art/Saint-Exupery-528691228 )
Traceback (most recent call last):
File "C:\Users\Main\AppData\Local\Programs\Python\Python37\Scripts\dagr.py", line 4, in
import('pkg_resources').run_script('dagr==0.64', 'dagr.py')
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\pkg_resources_init_.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\pkg_resources_init_.py", line 1445, in run_script
exec(script_code, namespace, namespace)
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\dagr-0.64-py3.7.egg\EGG-INFO\scripts\dagr.py", line 591, in
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\dagr-0.64-py3.7.egg\EGG-INFO\scripts\dagr.py", line 574, in main
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\dagr-0.64-py3.7.egg\EGG-INFO\scripts\dagr.py", line 285, in deviant_get
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\dagr-0.64-py3.7.egg\EGG-INFO\scripts\dagr.py", line 151, in find_link
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\robobrowser-0.5.3-py3.7.egg\robobrowser\browser.py", line 269, in get_link
self.parsed, _link_ptn, text=text, *args, **kwargs
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\robobrowser-0.5.3-py3.7.egg\robobrowser\helpers.py", line 51, in find
soup, name, attrs or {}, recursive, text, 1, **kwargs
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\robobrowser-0.5.3-py3.7.egg\robobrowser\helpers.py", line 39, in find_all
if match_text(text, tag):
File "c:\users\Main\appdata\local\programs\python\python37\lib\site-packages\robobrowser-0.5.3-py3.7.egg\robobrowser\helpers.py", line 16, in match_text
if isinstance(text, re._pattern_type):
AttributeError: module 're' has no attribute '_pattern_type'

does this work?

Add threading option

Splitted from #15

Support for threading options? Obviously some sort of sane cap is advised. (well, xargs technically takes care of this for multiple artists, so lower priority item)

Multi-core CPUs may help with global processing time too here. A basic option would be to split the download tasks (leaving pages crawling in main thread)

Fallback to preview not working?

Hi,

it seems like the page layout has changes since everytime a download button isn't found and it says "Download link not found, falling back to preview image" i've get the "download failed. mature content" warning

seems like the script is unable to find a preview image from source code?

PDF files not found

Version 0.71.3 on Python 2.7.15rc1 on Ubuntu

When downloading a gallery with PDFs, dagr reports "Download link not found, falling back to direct image," and it downloads the cover image / thumbnail that represents that file. Is there any way to allow for both links to be downloaded? Some story writers leave their work in PDF exclusively.

0.71 does not work with python2

As reported by @rickjthree in #33
$ dagr.py -g phil-chodagr.py v0.71 - deviantArt gallery ripperCurrent deviant: phil-choRipping phil-cho's gallery...Traceback (most recent call last): File "/usr/bin/dagr.py", line 4, in <module> __import__('pkg_resources').run_script('dagr==0.71', 'dagr.py') File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 738, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1506, in run_script exec(script_code, namespace, namespace) File "/usr/lib/python2.7/site-packages/dagr-0.71-py2.7.egg/EGG-INFO/scripts/dagr.py", line 571, in <module> File "/usr/lib/python2.7/site-packages/dagr-0.71-py2.7.egg/EGG-INFO/scripts/dagr.py", line 554, in main File "/usr/lib/python2.7/site-packages/dagr-0.71-py2.7.egg/EGG-INFO/scripts/dagr.py", line 334, in deviant_get File "/usr/lib/python2.7/site-packages/dagr-0.71-py2.7.egg/EGG-INFO/scripts/dagr.py", line 236, in get_pages File "/usr/lib/python2.7/site-packages/dagr-0.71-py2.7.egg/EGG-INFO/scripts/dagr.py", line 143, in get File "/usr/lib/python2.7/site-packages/mechanicalsoup/stateful_browser.py", line 346, in download_link link = self._find_link_internal(link, args, kwargs) File "/usr/lib/python2.7/site-packages/mechanicalsoup/stateful_browser.py", line 299, in _find_link_internal return self.find_link(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mechanicalsoup/stateful_browser.py", line 271, in find_link links = self.links(*args, **kwargs) File "/usr/lib/python2.7/site-packages/mechanicalsoup/stateful_browser.py", line 253, in links all_links = self.get_current_page().find_all(AttributeError: 'NoneType' object has no attribute 'find_all'

Also tested locally, at least checking types on str does not work with python2, and utime calls later on do not work on py2:
TypeError: utime() takes no keyword arguments

How to download files to another folder (Windows)

Hello, I am writing to you a message because I am concerned about one application. Namely, what should be the correct command for downloading files by the application to another partition and to another folder with spaces. Somebody could send me an example of what it would look like, because it keeps bothering me all the time and I can not download files to another folder.

And I have one more question in how the path should look like.

[Dagr]
OutputDirectory: D:\- one folders\- two folders

I am currently using this command

python dagr.py -gsfv name_deviantart

or

dagr.py -g -s -v -m name_deviantart

Which will be correct

Thank you in advance for your help.

Best Regards
Jarek

Get images via oEmbed API

https://www.deviantart.com/developers/oembed

It may be easier to parse the json/xml answer instead of search the entire web page, if the returned values include the full-size image

Download a single image

Currently dagr only downloads entire galleries or collections. Having the option to download a single image based on the corresponding deviantArt URL would be more useful in a use case like mine, where you're not interested in the entire gallery but just want a simple way to download an image somebody linked to without having to deal with a web page that requires both JavaScript and cookies to display an image.

Not able to download pics without specifying deviant username

It states I can get images based on query terms, but when I try this it says I must specify a deviant's username?

MechanicalSoup migration

RoboBrowser was great, but is a dead project now (last commit on 7 Jun 2015). Also while it still works fine, it does not work out of the box with Python 3.7 (see #25 ).

Investigate the possibility of migrating to https://github.com/MechanicalSoup/MechanicalSoup/ which is still maintained, and has a similar API

Reports that "{{username}}'s gallery had no deviations"

No matter what user's gallery you try to rip, dagr reports that it does not have any deviations.
Example:

dagr.py v0.62 - deviantArt gallery ripper
Attempting to log in to deviantArt...
Logged in!
Current deviant: kahchan97
Ripping kahchan97's gallery...
kahchan97's gallery had no deviations.

Called with my username and password, and with -gv switches.

Album/collections help is not clear

Hi,

I tried dagr and I gotta say I'm enthusiastic about it, since none of the programs/scripts I tried before did work properly.

I have no problems downloading a whole gallery, but when I specify an album (with -a or -c option), the script finds only the first page of the folder (it detects 24 images whereas the folders I've tried had more).

Could you investigate the issue ?
Thanks in advance :)

DeviantArt returns only .jpe files

Version 0.71.3 on Python 2.7.15rc1 on Ubuntu

I've noticed that all files are being downloaded as .jpe files instead of jpg or jpeg. Anyone else noticing this behaviour? Is this a typo or a quirk of deviantArt?

Is there a way to add a rename function into dagr so I dont have to do a batch convert operation after the fact? Its not the end of the world but definitely seems like a bug.

Add configuration file

Add a portable way to store configuration. This would be great for some parameters, mostly login/password

Download link text can change sometimes, generating "Possible mature deviation?" message

Hi there, voyageur.
I'm downloading from a gallery, with my user credentials, and I keep receiving the following error:
Possible mature deviation? ( http://cosplaybutterfly.deviantart.com/art/Anime-North-2012-1-1-362993649 )
If you follow the link, it's not a mature deviation at all. Besides, even if it was, I am using login information, so it should download normally, right?

Environment: 32 bit Ubuntu.

Login Issues

I am using this command:
dagr.py -u username -p password -g [artist]
I used the command before and it was fine, but now it seems to have some login issues.

dagr.py v0.63 - deviantArt gallery ripper
Attempting to log in to deviantArt...
Traceback (most recent call last):
File "C:\Python27\Scripts\dagr.py", line 628, in
main()
File "C:\Python27\Scripts\dagr.py", line 576, in main
ripper.start()
File "C:\Python27\Scripts\dagr.py", line 93, in start
self.login()
File "C:\Python27\Scripts\dagr.py", line 134, in login
form['username'] = self.username
File "C:\Python27\lib\site-packages\robobrowser-0.5.3-py2.7.egg\robobrowser\forms\form.py", line 216, in setitem
self.fields[key].value = value
File "C:\Python27\lib\site-packages\werkzeug-0.12.2-py2.7.egg\werkzeug\datastructures.py", line 781, in getitem
raise exceptions.BadRequestKeyError(key)
werkzeug.exceptions.BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.

Proper packaging

Add the proper packaging parts (dependencies, setup.py, ...), so it can easily be installed, dragging in the new dependencies

Not downloading original picture's name - "Download link not found, falling back to direct image"

Sometimes it's not downloading original picture's name.
Returining this: "Download link not found, falling back to direct image" and renaming picture to something generic like "img_6005_2_by_ghpart-d68dpho" or "untitled_by_ghpart-d7etyt1".
File downloads fine tho.