Giter Club home page Giter Club logo

Comments (5)

shadowmoose avatar shadowmoose commented on June 12, 2024 1

Yes, this functionality is currently broken - and likely both in this version and in the TypeScript rewrite. Reddit comments and submissions generally change or are lost over a long enough window, and coupled with the fact that the official Reddit API is (or was) extremely slow for individual lookups, PushShift was implemented as the sole solution for single targets.

For people using this functionality, the reason is generally because they have more saved posts than the official API will return (capped at 1000), so typical CSV downloads will have many thousands of posts to scan. Frankly, the Reddit API is unsuitable for this task. Due to the harsh rate limiting of their API, and also because of their general slow response time, processing a CSV directly through official means would take a significant amount of time. Ignoring the API response times and skipping the actual download calls, which use additional API queries in some cases, the optimistic run time just to retrieve 1000 individual posts within API limits is 30+ minutes. This also ignores any old deleted or edited posts, where the data will be completely unrecoverable. In the rewritten TS version, PushShift functionality was mandatory in order to reliably build relationships between saved comments and their parents, in the event that the parent submission had been removed from the live site.

This probably isn't the best place to discuss, but I may as well dump it here on the most recent issue caused by Reddit actions:
Bluntly, it's changes like this that have driven me away from supporting these Reddit-backed applications. They've been teasing for years now that they intend to restrict their API further and further, and it makes investing my energy into these projects seem like a tremendous waste. I've been directing my focus lately towards a more convenient, site-agnostic method of preserving media, which I'd rather push forward with rather than supporting a site that doesn't support its users in return.

Suffice it to say that I'm unlikely to expend much effort towards bringing these features back in the short term. I have very limited time to work on my passion projects these days, and I would prefer not to waste that time stepping into adversarial relationships with social media site developers.

If they get things sorted out with PushShift, then everything should start working again and I'll be more encouraged to move forward with completing the rewrite, which also heavily utilizes PS. If not... well, it will likely be impossible to reimplement the lost functionality to the level people expect from the application. The code to add a bandage fix exists - scattered around - within RMD already, and I'm very open to accepting Pull Requests, but I probably won't be the one implementing it. The fix would only get RMD limping along, and honestly that seems likely to only raise more complaints and issues. At this point I'll be keeping an eye out for future Reddit API developments, and should anything come up, I'll be happy to revisit this.

from redditdownloader.

rktfier avatar rktfier commented on June 12, 2024 1

Good news, it seems like they sorted things out with PushShift and it is coming back in the following month. Bad news is that

"use of Pushshift will be limited to moderation use cases only."

"Though access to Pushshift data for research purposes is not available at this time, , we are keen to explore possibilities that might allow us to provide researchers with access to datasets essential for their valuable social media research. We understand the significance of empowering the academic community, and we are dedicated to working with Reddit to develop frameworks that responsibly balance data access, data security, and user privacy."

source

from redditdownloader.

shadowmoose avatar shadowmoose commented on June 12, 2024

PushShift is currently broken, due to API restrictions that Reddit staff are implementing. As a result, I will be unable to support any further PushShift development until (and if) they work something out with Reddit.

from redditdownloader.

toadthetoad avatar toadthetoad commented on June 12, 2024

So the CSV download is effectively dead now? The --full_csv flag seems to imply it will bypass the need for PushShift but it fails in the same way. Is there an easy bypass?

from redditdownloader.

JulianKauth avatar JulianKauth commented on June 12, 2024

The psaw library seems to be pretty heavily integrated into the project. Not even direct url download works without it.

Though that might be easier to hack, as far as I can tell in that case pushshift is only needed to get the metadata from a reddit post to create an instance of the processing.wrappers.redditelement.RedditElement class. The unfortunate lack of type annotations in the project doesn't make it easy though.

PS: @vincenzogianfelice I am appalled by the entitlement displayed in your comment. This software is provided entirely free of charge, the least you could do is to be nice to the developer.

from redditdownloader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.