Giter Club home page Giter Club logo

redditdownloader's Introduction

Hey there, I'm Mike

I author and maintain a bunch of open source projects. Programming is my profession as well as my hobby, and this profile is filled with projects that I have created in my free time. Some of them are related to - or built off - each other, but most of them are completely unrelated and represent whatever I thought it would be fun to work on at the time.

If it ever seems like I've dropped off the face of the earth, it's probably because I'm working on one of many private projects that aren't ready to see the light of day yet. For questions about specific projects of mine, open an issue or a discussion on the relevant repo. If you have general inquiries about myself, feel free to reach out via shadowmoo.se.

redditdownloader's People

Contributors

akaecho avatar b13rg avatar dependabot[bot] avatar jgaruti avatar powerm4 avatar shadowmoose avatar suhkotos avatar the-compiler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redditdownloader's Issues

[Feature] Reddit Saved Posts with Gold

As you may know, reddit allows you to only see 1000 saved posts. But if you buy gold, this goes to 1000 per subreddit. I used your program to download the 1000 posts then learned of the limit and bought gold to be able to download more images. I unsaved the first 1000 posts.

Now the problem is, normally saved posts are accessible from "www.reddit.com/user/USERNAME/saved", but when you have gold, this section is still empty (in my case at least, because I unsaved the first 1000 posts). Now, I've to go to "www.reddit.com/user/USERNAME/saved?sr=SUBREDDITNAME" and this again allows me to see 1000 saved posts PER subreddit. But I can't seem to access this using your program.

I don't know if reddit API allows you to access saved posts like this, so I was thinking if you could also add a new source option, one that allows you to add LINKS from which it would download the posts. So I could give links, lets say "www.reddit.com/user/vargas/saved?sr=aww" and get posts from each subreddit one by one.

I would really appreciate it, thanks!

Tried to run a Raspberry Pi... AttributeError: 'NoneType' object has no attribute 'erase_screen'

Hi @shadowmoose ,

What can be causing the following error? Seems to be python related. (Colorama is in the newest version)

Thanks a lot a keep up the amazing work.


Downloading from Source: foo
Element loading complete.

Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Traceback (most recent call last):
  File "main.py", line 227, in <module>
    p.run()
  File "main.py", line 124, in run
    self.processor.run()
  File "./classes/processing/elementprocessor.py", line 40, in run
    self.redraw(clear, q)
  File "./classes/processing/elementprocessor.py", line 92, in redraw
    print(out.rstrip(), end='')
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 40, in write
    self.__convertor.write(text)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 141, in write
    self.write_and_convert(text)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 167, in write_and_convert
    self.convert_ansi(*match.groups())
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 181, in convert_ansi
    self.call_win32(command, params)
  File "/usr/local/lib/python3.4/dist-packages/colorama/ansitowin32.py", line 212, in call_win32
    winterm.erase_screen(params[0], on_stderr=self.on_stderr)
AttributeError: 'NoneType' object has no attribute 'erase_screen'

Windows Directory Filename Bug

There appears to be an interesting issue with Windows directory names, where Windows isn't able to properly handle any directory ending with a trailing space. Despite this, the OS will allow you to create these directories without complaining - but will then be unable to rename/delete the folder through typical means.

Due to the way duplicate file names were handled, this made it possible to end up with directories that Windows secretly can't work with through the UI. The only solution is to delete the problematic directory using its Windows short path, instead of the normal file name (see here).

With the default RMD file pattern, this issue is probably unlikely to happen, however I've pushed out a fix (02eb467) to be sure it won't be capable of generating an infringing directory using any dynamically-inserted values. This will be rolled out soon along-side the new Threading patch.

RMD not downloading posts >2000 in any subreddit

Describe the bug
RMD does not discover all posts in a subreddit, limiting to amounts between 1800-2000 posts, and doesn't find anymore posts.

Environment Info (please complete the following information):

  • OS: Archlinux
  • RMD Version: 3.0

Error generating user agent on wizard.

I wasn't able to update through the app, so I deleted everything and downloaded again.
Here's what I got:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 66, in __init__ self.settings = Settings(settings_file, can_save=(c_settings is None), can_load=(not args.test) ) File "./classes\settings.py", line 42, in __init__ wizard.run(file) File "./classes/wizards\wizard.py", line 70, in run "user_agent": "RMD-"+random.random(), TypeError: must be str, not float

Authorize an Account doesn't do anything

Hi all. Fresh install on a Ubuntu 18.04 system. Erased my prior settings and manifest files.
On first run, the web interface opens up and I go to "settings" tab, then click on "Authorize an account" and nothing seems to happen. The account doesn't seem to authorize because subsequent downloading doesn't find anything to download.

Thanks.

RMD backend keeps disconnecting

Hi.
I have been really liking RMD. One particular issue, that I have been facing since the first time I used it, is that the browser integration or the WebUI, keeps on disconnecting (as shown in the attached screenshot). It becomes very difficult to add sources using webui as it keeps quitting on me. I don't think the port has anything to do with it. Currently, I have tweaked my settings file where I have kept the "keep it open" parameter as "true". But the UI is still quitting. I have tried downloading Chrome for the same but the same thing happens there too. I have tried changing browser from default and chrome but not to my help.
I hope you can show me what I maybe doing wrong here. Thank you.
issue-1

Transform Handlers into better Class module

Now that we're beyond the "get it working" stage;

All handler objects should be encapsulated into a "Handler" class module, to supply more generic functionality down the road. This shouldn't significantly impact the main logic, but will enable generic functionality - such as a generic file download function.

Syntax Error: Invalid Syntax

I am on Mac OSX High Sierra and downloaded and unzipped the folder. I ran your command "pip install -r requirements.txt" after downloading Python. At first it didnt work so i ran "sudo easy_install-3.7 pip". it worked. But then when I tried running "python main.py" it gives me
"File "main.py", line 201
print(("\t%3d:%-" + padding_len + "s -> ") % (i, name), end='')
^
SyntaxError: invalid syntax
"

Clean up main reddit class

The whole main loop was written for personal use initially, and now that it's live this code is entirely sub-par.
Split the main logic into a better class module, or at least clean it up a bit.

Download logic seems to expect Windows?

(To get this running I quickly updated my 2FA code in the setting.json, saved, and ran the script)

[parker@inspiron15 RedditDownloader]$ python main.py 

====================================
    Reddit Media Downloader 2.0
====================================
    (By ShadowMoose @ Github)

Loading all settings from file.
Loaded Source:  Source1
Authenticating via OAuth...
Authenticated as [parkerlreed]

Downloading from Source: Source1
Element loading complete.

Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Loaded handlers:  imgur, github, reddit, ytdl, newspaper
Traceback (most recent call last):
  File "main.py", line 228, in <module>
    p.run()
  File "main.py", line 124, in run
    self.processor.run()
  File "./classes/processing/elementprocessor.py", line 40, in run
    self.redraw(clear, q)
  File "./classes/processing/elementprocessor.py", line 92, in redraw
    print(out.rstrip(), end='')
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 40, in write
    self.__convertor.write(text)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 141, in write
    self.write_and_convert(text)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 167, in write_and_convert
    self.convert_ansi(*match.groups())
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 181, in convert_ansi
    self.call_win32(command, params)
  File "/usr/lib/python3.6/site-packages/colorama/ansitowin32.py", line 212, in call_win32
    winterm.erase_screen(params[0], on_stderr=self.on_stderr)
AttributeError: 'NoneType' object has no attribute 'erase_screen'

Build Tests

Implement build testing, probably through TravisCI, once major structure changes are complete.

Shouldn't be very difficult to generate some dummy data and make sure it passes tests.

Hangs on long file names in Ubuntu

Thanks for the application. I’m having this issue on Ubuntu where a long title.

We just hang with Errno 36 filename too long and do nothing. I'm downloading to a share on my network from a Ubuntu host.

Is there a fix for this?

Long paths on Windows

RedditDownloader fails to download any posts that would result in a file being created that has a full path of over 260 characters in length on Windows 7.

While this issue is entirely Microsoft's fault, I would still appreciate a workaround. Either some way of shortening names or using APIs that allow the use of long paths, whatever works.

Feeding a list of subreddits

Hi, I have a list of nearly 500 subreddits I´m subscribed to. Is there a way to directly feeding it into the app, or you have to add them manually, one by one? (I normally use cat + parallel to feed a list of subreddits into an "argumentable" script, but yours doesnt seem to have the oportunity to add a subreddit by argument (eg script.py --subreddit X)) Thanks

PushShift doesn't work

It appears PushShift has been depreciated. The Reddit tools on its website no longer work.

RMD is only downloading first page of user submitted posts

I'm almost certain the post limit on Reddit is 1000, yet RMD only finds either 100 or 200 max source posts when downloading a user name.
capture
capture
There are no issues downloading a 1000 posts using other options. The "bug"(??) only occurs when using the "A User's Submission and/or Comment History" setting.
capture 1

Automatic Handler detection & updating

With the current method of loading Handlers it's dead-simple to first check the existing handler.py files against a list online of updated & new ones.

Since the main logic won't often need to change once finalized, this would allow dynamic updating of sorts, with large potential for down the road.

Once #3 is taken care of, this will probably be the next step.

The UI never loads

I run the command to start the app and it opens the chrome browser but the page is just blank. I downloaded a fresh version from the repo and did the same and had the same thing. Even with a fresh settings file. Is there an issue with my python or something?

Can't run the script at all

I did pip install -r requirements.txt but still get this:

PS C:\Users\Ali\git\@shadowmoose\RedditDownloader> python .\main.py
Traceback (most recent call last):
  File ".\main.py", line 56, in <module>
    from classes.webserver import eelwrapper
  File "C:\Users\Ali\git\@shadowmoose\RedditDownloader\classes\webserver\eelwrapper.py", line 1, in <module>
    import eel
ModuleNotFoundError: No module named 'eel'

i.redd.it files images not downloading

Hey there. Just wanted to double check: Is it me, or do images ULed to i.redd.it fail to download? I'm using the webui version and successfully getting downloads that are hosted on imgur, but not i.redd.it. Thank you!

Hangs on download

Well everything was working. I now get the following output on stderr and the program hangs:

Downloading from Source: default-downloader
Exception in thread Handler - 3:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 5:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 4:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 1:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent

Exception in thread Handler - 2:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 42, in run
self.process_ele(item)
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/processing/handlerthread.py", line 95, in process_ele
manifest.insert_post(reddit_element) # Update Manifest with completed ele.
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 101, in insert_post
direct_insert_post(ele['id'], ele['author'], ele['source_alias'], ele['subreddit'], ele['title'], ele['type'], ele['files'], ele['parent'], ele['body'])
File "/home/ted/bin/shadowmoose-RedditDownloader-eef45da/classes/util/manifest.py", line 90, in direct_insert_post
(_id, author, source_alias, subreddit, title, _type, parent, body)
sqlite3.OperationalError: table posts has no column named parent``

Error during setup after Authentication

After going through the setup, it said: Authenticated as [username]

And then the following error:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 66, in __init__ self.settings = Settings(settings_file, can_save=(c_settings is None), can_load=(not args.test) ) File "./classes\settings.py", line 42, in __init__ wizard.run(file) File "./classes/wizards\wizard.py", line 69, in run "user_agent": "RMD-"+random.random(), TypeError: must be str, not float

After running python main.py again after this error:

Using file values. Authenticating via OAuth... Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 95, in __init__ reddit.login() File "./classes\reddit.py", line 36, in login _user = _reddit.user.me() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\models\user.py", line 99, in me user_data = self._reddit.get(API_PATH['me']) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\reddit.py", line 408, in get data = self.request('GET', path, params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\praw\reddit.py", line 534, in request params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 185, in request params=params, url=url) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 116, in _request_with_retries data, files, json, method, params, retries, url) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 101, in _make_request params=params) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\rate_limit.py", line 35, in call kwargs['headers'] = set_header_callback() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\sessions.py", line 145, in _set_header_callback self._authorizer.refresh() File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 328, in refresh password=self._password) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 138, in _request_token response = self._authenticator._post(url, **data) File "C:\Users\Arlo\AppData\Local\Programs\Python\Python36-32\lib\site-packages\prawcore\auth.py", line 31, in _post raise ResponseException(response) prawcore.exceptions.ResponseException: received 401 HTTP response

Thanks

RMD not downloading images from single-image imgur pages

Getting the following error when RMD tries to download from imgur if its not a direct image link or an album:

 URL: https://imgur.com/nw33ENa
 Checking handler: imgur
         Imgur Error: URL must be a valid Imgur Album
 Checking handler: github
 Checking handler: reddit
 Checking handler: ytdl
         YTDL :: ERROR: No sources found for video nw33ENa. Maybe an image?
 Checking handler: newspaper
         "Newspaper" Generic handler failed. Configuration object being passed incorrectly as title or source_url! Please verify `Article`s __init__() fn.
 !No handlers were able to accept this URL.

It should be noted that I'm running an earlier version so if this was fixed in a later version, let me know what specific changes I need to make to the code.

SyntaxError - OS X

HI, I'm hitting a snag over here.

"main.py", line 252
print( ("\t%3d:%-"+padding_len+"s -> ") % (i, name) , end='')
^
SyntaxError: invalid syntax

I'm running OS X 10.13.2, running python 2.7

Many thanks in advance for any assistance!

Main method entry point

There should be a Main method implemented, in order to allow the user to override settings by passing their own custom params in-line rather than needing to build a settings file.

Expand the custom params to enable features such as less-verbose logging, custom output directory, custom output format, etc.

All results are not shown

I am not getting all post when downloading. According to Reddit Search page, the limitation of the search is 1000 results.s RedditDownloader using the default Reddit search results? If so, that explains why I am not getting all post downloaded.

Add some form of output manifest

Some programs may want to piggyback on the file structure this creates (say, a personal browser or something).
We should dump a full JSON manifest of whatever appropriate information about the post -and that posts' files - that we can assemble. Consider including and updating with each run things like comment #'s, Upvotes, and other metrics.

This likely goes hand-in-hand with #1

Proliferate User-Agent through all Handlers

All Handlers, where possible to set, should use the supplied User-Agent string - not just the Reddit client.
This will prevent some sites from potentially blocking API calls due to common User-Agent strings.

This is halfway implemented in testing, but needs completion.

Exceptions with installing requirements (RedditDounloader)

First of all thanks for putting together this script! This is exactly what I am looking to use for a school project.

When I ran the pip install -r requirements.txt command I get the following output towards the bottom of the list:

**Exception:
Traceback (most recent call last):
File "c:\program files (x86)\python36-32\lib\site-packages\pip\basecommand.py", line 215, in main
status = self.run(options, args)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\commands\install.py", line 342, in run
prefix=options.prefix_path,
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_set.py", line 784, in install
kwargs
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\req\req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "c:\program files (x86)\python36-32\lib\site-packages\pip\wheel.py", line 345, in move_wheel_files
clobber(source, lib_dir, True)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\wheel.py", line 316, in clobber
ensure_dir(destdir)
File "c:\program files (x86)\python36-32\lib\site-packages\pip\utils_init_.py", line 83, in ensure_dir
os.makedirs(path)
File "c:\program files (x86)\python36-32\lib\os.py", line 220, in makedirs
mkdir(name, mode)
PermissionError: [WinError 5] Access is denied: 'c:\program files (x86)\python36-32\Lib\site-packages\certifi'

All of this text is in red and the main.py script will not run because it can't locate a module.
An ideas on what is causing the requirements to not install properly?
thanks, Steve

Automated downloading

Is it possible to check for new submissions in subreddits every x (adjustable) minutes and download those new submissions automatically?

Thank you.

new ways to sort

One of my sources is a user's submissions. The downloads are sorted based on the subreddit they were posted to. The ability to sort these into a separate folder would be cool.

For example if my source is u/tom and he posts in subreddits r/gaming and r/math, then his posts would be divided into the two folders gaming and math. Is it feasible to store all of his posts into a separate folder titled tom?

Thanks in advance

Filters created through the web gui are incorrect

Hi,
I wanted to set up a few filters through the web gui but the filters created are wrong and result in no files getting downloaded (because they match no filter).

For example creating a "Title matches: 123" filter results in the creation of the following line in the settings.json:
"title": "123"

However, the line should read:
"title.match": "123"

The same thing happens with score: minimum (results in "score": when it should be "score.min":) and I would assume other filters as well.

2FA not supported

For the initial login I used <password>:<auth-code> to login. That was successful but it seems every time the script is invoked, Reddit is expecting a new auth code on the password. I'd rather not disable my 2FA for this.

[parker@inspiron15 RedditDownloader]$ python main.py

====================================
    Reddit Media Downloader 2.0
====================================
    (By ShadowMoose @ Github)

Loading all settings from file.
Loaded Source:  <source>
Authenticating via OAuth...
Traceback (most recent call last):
  File "main.py", line 227, in <module>
    p = Scraper(settings, custom_settings)
  File "main.py", line 116, in __init__
    reddit.login()
  File "./classes/reddit/reddit.py", line 36, in login
    _user = _reddit.user.me()
  File "/usr/lib/python3.6/site-packages/praw/models/user.py", line 95, in me
    user_data = self._reddit.get(API_PATH['me'])
  File "/usr/lib/python3.6/site-packages/praw/reddit.py", line 371, in get
    data = self.request('GET', path, params=params)
  File "/usr/lib/python3.6/site-packages/praw/reddit.py", line 486, in request
    params=params)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 182, in request
    params=params, url=url)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 113, in _request_with_retries
    data, files, json, method, params, retries, url)
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 98, in _make_request
    params=params)
  File "/usr/lib/python3.6/site-packages/prawcore/rate_limit.py", line 32, in call
    kwargs['headers'] = set_header_callback()
  File "/usr/lib/python3.6/site-packages/prawcore/sessions.py", line 142, in _set_header_callback
    self._authorizer.refresh()
  File "/usr/lib/python3.6/site-packages/prawcore/auth.py", line 328, in refresh
    password=self._password)
  File "/usr/lib/python3.6/site-packages/prawcore/auth.py", line 142, in _request_token
    payload.get('error_description'))
prawcore.exceptions.OAuthException: invalid_grant error processing request

RedditDownloader crashes when trying to rip a post with a long title

Unfortunately, right in the middle of a 17,000+ post rip, RedditDownloader failed out with this error:

Traceback (most recent call last): File "main.py", line 175, in <module> p = Scraper(settings, custom_settings) File "main.py", line 105, in __init__ self.processor.run() File "./classes\elementprocessor.py", line 35, in run self.process_ele(ele) File "./classes\elementprocessor.py", line 60, in process_ele file_path = self.process_url(url, file_info) File "./classes\elementprocessor.py", line 73, in process_url ret = h.handle(url, info) File "./classes/handlers\imgur.py", line 276, in handle downloader.save_images(targ_dir) File "./classes/handlers\imgur.py", line 135, in save_images os.makedirs(album_folder) File "E:\Program Files\Python\Python36\lib\os.py", line 220, in makedirs mkdir(name, mode) OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect

The post that it was trying to scrape before it crashed had a very long name, and I suspect that this was the problem.

Headless support

I went through the process of setting up RedditDownloader on a Raspberry Pi and hit a few roadblocks. I run it headless without an X server so browser auth was a bit of an issue. I'll just go over it here so others who had the same issues can find a way to set the app up. Firstly want to preface this is an absolutely stellar tool and I'm glad to have it working and love the work done on it. I can see that some issues I found probably won't be easily solvable.

The first issue was that I can't use localhost as a webserver as it will prevent any remote connections from accessing it. That's fine, I just changed it to my local network IP of the Pi in settings.json.

I found that port 7505 conflicts with OpenVPN's management interface, which is unfortunate, so I switched to port 7506. However, the current React OAuth launch logic doesn't support using any other port (just got a blank page) so I disabled OpenVPN just for setup.

Having launched the webserver and navigated to the site, I attempted auth but got some issues from Reddit regarding an invalid URL. It turns out that changing localhost to your local IP causes the redirect_uri parameter to mismatch against RMD's reddit app redirect_uri setting (which is set up for localhost).

This meant that I had to re-use an old developer app I created and change its redirect_uri to my local IP and change the client_key to my app's client key in settings.json. With this I was able to obtain a 302 redirect from Reddit containing the authorization code.

From here, the second step of OAuth was attempted but because I had changed the client key to my own app, and RMD doesn't include a client secret, reddit returned a 401 Unauthorized response. So instead I completed the second step manually via Postman with the client key, client secret, authorization code, state and redirect URI parameters to retrieve a refresh token.

Finally, after plugging in the refresh token into settings.json, further requests were failing as the client secret was still needed. So I modified classes/static/praw_wrapper.py and classes/static/settings.py to accept a new key in settings.json defined as auth.rmd_client_secret which would allow a client secret to be specified and sent through with each request. This allowed full authentication to complete successfully.

From this, I learnt that increasing headless support is going to be very challenging due to the redirect_uri parameter not being manipulatable from the app itself. Much of the work was required as I was using my own developer app to run RMD due to the control required over the redirect_uri parameter.

Thinking through the process now, it would be far more robust to simply set up RMD on another machine with a browser, then move settings.json over to the headless browser. I'm aware rclone supports this kind of behaviour and suggests it in the wizard. Perhaps an idea could be to suggest this to the user through setup. Given they've thought this through and implemented it, it may be the best solution. I don't think reverting to the old RMD behaviour where the user is required to create their own app is sustainable and I appreciate the move away from it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.