Giter Club home page Giter Club logo

gnip-analysis-tools's Introduction

The gnip_analysis_config Package

This package provides useful class definitions for configuring scripts in the Gnip-Analysis-Pipeline package. The intention is that you work from a working directory (we'll call this "TEST"), and that both gnip_analysis_pipeline and gnip_analysis_config are installed as packages. Remember that these packages can be installed from the cloned repo location with:

[REPO] $ pip install -e .

Enrichments

According to the Gnip-Analysis-Pipeline docs, we configure enrichments by defining the enrichment_class_list variable in a configuration file.

The enrichments directory in this package contains files that define a base enrichment class along with some other helpful enrichment classes, including a simple example. To use the test enrichment from your working directory, you would create an enrichments configuration file (called my_enrichments.py):

from gnip_analysis_config.enrichments import test_enrichment

enrichment_class_list = [test_enrichment.TestEnrichment]

We can the enrich the Tweets in my_tweets.json as follows:

[TEST] $ cat my_tweets.json | tweet_enricher.py -c my_enrichments.py > my_enriched_tweets.json

To configure an NLP enrichment with NLTK, we provide nltk_enrichment.py, which can be configured like:

from gnip_analysis_config.enrichments import nltk_enrichment

enrichment_class_list = nltk_enrichment.nltk_enrichments_list

Notice that this module has conveniently defined the list of enrichment classes.

A custom enrichment class can be defined locally:

from gnip_analysis_config.enrichments import base_enrichment

class MyEnrichment(enrichment_base.BaseEnrichment):
    def enrichment_value(self,tweet):
        return "my_test_enrichment_value"

Measurements

According to the Gnip-Analysis-Pipeline docs, we configure measurementss by defining the measurements_class_list variable in a configuration file.

The measurements directory in this package contains files that contain a variety of base/helper classes for construction measurement classes. To use the test measurement from your working directory, you would create an enrichments configuration file (called my_measurements.py):

from gnip_analysis_config.measurements.test_measurements import TweetCounter,ReTweetCounter

enrichment_class_list = [TweetCounter,ReTweetCounter]

We can the build time series from the Tweets in my_enriched_tweets.json as follows:

[TEST] $ cat my_enriched_tweets.json | tweet_time_series_builder.py -c my_measurements.py > time_series.csv

(Note that none of the enrichments we added in the previous section are required to build the specified time series.)

To construct a time series for each observed hashtag, we can define a class locally that inherits key functionality from classes in measurement_base.py:

from gnip_analysis_config.measurements.measurement_base import Counters

class HashtagCounters(Counters):
    def update(self,tweet):
        for item in tweet['twitter_entities']['hashtags']:
        # put a # in from of the term,
        # since they've been removed in the payload
        self.counters['#'+item['text']] += 1

measurement_class_list = [ HashtagCounters ]

See measurement_base.py for a full description of how to create custom measurement classes.

The gnip_analysis_tools Package

This package is a repository for common analysis tools. For example nlp/utils.py defines some common choices for NLP utility functions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.