Giter Club home page Giter Club logo

twitter-scraping's People

Contributors

thiago-paim avatar

Stargazers

 avatar

Watchers

 avatar

twitter-scraping's Issues

Refactor scraping for new limitations

The scraping task must be refactored because TwitterSearchScraper doesn't work anymore. Should also takje the opportunity to create a better separation of concerns:

  • record_tweet: Salva um tweet no banco, junto com tweets relacionados (replies_to, conversation, retweets, quoted)
  • scrape_user_tweets (atual scrape_tweets_from_user): Raspa todos os tweets de um usuário
  • scrape_tweet_replies: Raspa todas as respostas de um tweet em específico
  • scrape_user_tweet_replies: Itera sobre os tweets salvos de um usuário (por uma scraping_request?) e chama o scrape_tweet_answers para eles

Add Tweet field for unaccented content

Having a field for unaccented content will allow make it easiear for searching key words.

Steps:

  • Add Tweet.unaccented_content field
  • Add TweetManager.contains_unaccented_hate_words

Refactor create_next_scrapping_request to split ScrappingRequest creation and start

As is, the task create_next_scrapping_request both creates new ScrappingRequests (based on lists in values.py and also starts the next one.
A better architecture would be to decouple it, having one periodic task that just checks and starts the next request, and another non-periodic task to create ScrappingRequests (and have them waiting on the line)

Increase visibility over ScrappingRequest results

Currently we only keep record of the latest ScrappingRequest that affected created/updated a Tweet, so if you run similar scrappings you might lose track of their results. Also, this is not visible by the admin (apart from a simple count of tweets).

Ideally it should be easy to:

  • Check which tweets were created by a ScrappingRequest
  • Check which tweets were updated by a ScrappingRequest
  • Filter tweets created/updated by a ScrappingRequest
  • Check all the errors that happened during a scrapping

Admin doesnt work with DEBUG disabled

The admin doesnt load static files and thus is rendered incorrectly.
Error: Refused to apply style from '<URL>' because its MIME type ('text/html') is not a suppor

Create an easier way of exporting tweets csv's

Currently you need to open django shell for exporting a CSV, which is not optimal for less technical users. Best options would probably be functions to be called from django-admin.

Some options

  • Function in ScrappingRequest admin list page
  • Function in Tweet admin list page

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.