Giter Club home page Giter Club logo

Comments (14)

cuducos avatar cuducos commented on June 8, 2024 1

Adding the proper credits for the idea: thanks @arcanj0 ; )

from whistleblower.

arcanj0 avatar arcanj0 commented on June 8, 2024

Very nice!

Would be very helpful for the toll itself, like you said.
The cons would be you could be exposed to Bots using the hashtag in a "Evil way", but I guess it's not that hard to identify!

The way it is today doesn't give the followers the certainty to be counted as a help.

I would love to see it working!

Thanks @cuducos !

from whistleblower.

vmesel avatar vmesel commented on June 8, 2024

@cuducos, you are saying that you would like to create a dataset with the data:

recipe_id | count_positive | count_negative | real_classification

If you do, there is no API to checkout the replies for tweets, but there is a way you can scrape a user profile in order to find a reply_to_tweet method with the original @RosieDoSerenata tweet ID. The link to do this is: https://stackoverflow.com/questions/2693553/replies-to-a-particular-tweet-twitter-api

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

First of all:

If you do, there is no API to checkout the replies for tweets

Ow, c'mon @vmesel — you're better than that ; )

Do you prefer to trust in a 2010 Stack Overflow reply or in Twitter API official documentation?

BTW there's a Twitter API Python wrapper I've contributed to that offers GetReplies method to list replies to a certain user.

The main question is: given a @RosieDaSerenata list of replies, are we able to identify which tweet is each reply is actually refers to using this wrapper? Twitter API offers in_reply_to_… properties.

That said we have to choose a API wrapper or to deal directly with the REST API.

create a dataset with the data:
recipe_id | count_positive | count_negative | real_classification

I'd prefer to have count as something like a query result so later we can filter (eg discard congressperson replies), So I'd go with something like that:

document_id | tweet_id | reply_id | reply_user | suspicion_confirmed

Where:

  • document_id: reimbursement document_id
  • tweet_id: original Rosie's tweet ID
  • reply_id: reply tweet ID
  • reply_user: user who replied to Rosie
  • classification: true or false

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

Complementing my previous message, a roadmap for a possible implementation would be:

  1. check all new replies @RosieDaSerenata has got
  2. check if in the in_reply_to_… field we have a @RosieDaSerenata tweet with a valid Jarbas URL
  3. fetch the original tweet and extract the document_id from id
  4. scan the reply for a certain hashtag (#falsoPositivo or #RosieAcertaOutraVez for example)
  5. persist that data in a dataset or database

from whistleblower.

vmesel avatar vmesel commented on June 8, 2024

@cuducos I was saying that there is no reply endpoint, I do know that, there is a mentions endpoint and the example (that you've said it's old, but it's still gold) given uses the same logic you said.

Stack Overflow example:

From status/show you can find the user's id. Then statuses/mentions_timeline will return a list of status for a user. Just parse that return looking for a in_reply_to_status_id matching the original tweet's id.

Why should we have individual records for replies if we just want to classify the receipt? This consumes lots of unnecessary space on the database.

We could do an update on the line referencing this tweet/document id.

Another thing on this request is: how can we recover the Jarbas document ID without making a web request, this would be time and money consuming (using the network costs of DO, or AWS).

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

Why should we have individual records for replies if we just want to classify the receipt?

As I've said:

so later we can filter (eg discard congressperson replies)

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

Oops; sent before finishing… so here we go again.

Why should we have individual records for replies if we just want to classify the receipt?

As I've said:

so later we can filter (eg discard congressperson replies)

I cannot predict which filters are useful (bots could be created in massive attacks to poor Rosie, for example), so I think storing them all is the best choice.

how can we recover the Jarbas document ID without making a web request

Does the text in the tweet API is the full URL or Twitter shortened version? If it's shortened, there's no wayout. If it's not we can parse the URL (it's always https://jarbas.serenatadeamor.org/#/documentId/<document_id>).

from whistleblower.

Irio avatar Irio commented on June 8, 2024

@cuducos The URLs are shortened. You get the full ones making a request to https://t.co/.

from whistleblower.

vmesel avatar vmesel commented on June 8, 2024

@Irio so we have a great bandwith problem here! Depending on the quantity and how we are going to organize the requests to http://t.co/ it might think its a DDOS attack and block our IPs.

from whistleblower.

Irio avatar Irio commented on June 8, 2024

@vmesel This is not a problem since the document_id's of all posts are stored in a database. Check again the link I posted.

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

we have a great bandwith problem here!

Do we? Share the code and we can help you.

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

Depending on the quantity and how we are going to organize the requests to http://t.co/ it might think its a DDOS attack and block our IPs.

This is not a bandwidth problem: this is a software problem. If this is a problem we just pause between requests, for example, as we do here.

from whistleblower.

cuducos avatar cuducos commented on June 8, 2024

how we are going to organize the requests to http://t.co/

@vmesel we don't need to do any request to t.copython-twitter fetches the expanded URL directly for us, I'm using it on Jarbas.

from whistleblower.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.