Giter Club home page Giter Club logo

twitter-scraper's Introduction

Twitter-Scraper

This is not a perfect scraper, so feel free to add improvements if you find any.

IMPROVEMENTS:

  • Improved error handling so that tweets are not rejected if certain fields are null, etc...
  • Leveraged the WebDriverAwait class to enable better detection of desired load states
  • Each record is saved while scraping instead of all at the end; minimizing data loss for a failed session.

NOTES AND THINGS TO THINK ABOUT:

  • The scroll_down_page function has an argument for num_seconds_to_load that represents the num of seconds that the program will wait until attempting to scroll again. I'm currently making 5 attemps with a pause between. You could also increase the number of max attempts and decrease the num_seconds_to_load. This could possibly speed up the scraping as you would be more likely to get to a successfull scroll down quicker.
  • The collect_all_tweets_from_current_view function has a lookback_limit argument that controls how many tweets are processed from each scroll. I've written more about this in the function docstring.
  • I've implemented WebDriverWait in several sections of this updated code. I think this is a much better solution than a hard-coded sleep call because it will only timeout after a certain period of time if specific conditions are not met. There are many other sections of this code that could be improved, I'm sure, by leveraging this class.
  • Feel free to replace the save_tweet_data_to_csv function with any other io option you want, such as a database save via pyodbc, sqlite3, or whatever you want really.
  • I encourage you to explore the "Advanced Search" functionality. Try adding your criteria and see how the url is built. You can then leverage this to make your searches more customized... with date ranges, special keywords, etc... --> https://twitter.com/search-advanced?

twitter-scraper's People

Contributors

israel-dryer avatar mega-barrel avatar vijayshankarrealdeal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.