Giter Club home page Giter Club logo

gnip-analysis-tools's Introduction

The gnip_analysis_tools Package

This package provides useful class definitions for configuring scripts in the Gnip-Analysis-Pipeline package. The intention is that you work from a working directory (we'll call this "TEST"), and that both gnip_analysis_pipeline and gnip_analysis_tools are installed as packages. Remember that these packages can be installed from the cloned repo location with:

[REPO] $ pip install -e .

Enrichments

According to the Gnip-Analysis-Pipeline docs, we configure enrichments by defining the enrichment_class_list variable in a configuration file.

The enrichments directory in this package contains files that define a base enrichment class along with some other helpful enrichment classes, including a simple example. To use the test enrichment from your working directory, you would create an enrichments configuration file (called my_enrichments.py):

from gnip_analysis_tools.enrichments import test_enrichment

enrichment_class_list = [test_enrichment.TestEnrichment]

We can the enrich the Tweets in my_tweets.json as follows:

[TEST] $ cat my_tweets.json | tweet_enricher.py -c my_enrichments.py > my_enriched_tweets.json

To configure an NLP enrichment with NLTK, we provide nltk_enrichment.py, which can be configured like:

from gnip_analysis_tools.enrichments import nltk_enrichment

enrichment_class_list = nltk_enrichment.nltk_enrichments_list

Notice that this module has conveniently defined the list of enrichment classes.

A custom enrichment class can be defined locally:

from gnip_analysis_tools.enrichments import base_enrichment

class MyEnrichment(enrichment_base.BaseEnrichment):
    def enrichment_value(self,tweet):
        return "my_test_enrichment_value"

enrichment_class_list = [MyEnrichment] 

Measurements

According to the Gnip-Analysis-Pipeline docs, we configure measurementss by defining the measurements_class_list variable in a configuration file.

The measurements directory in this package contains files that contain a variety of base/helper classes for construction measurement classes. To use the test measurement from your working directory, you would create an enrichments configuration file (called my_measurements.py):

from gnip_analysis_tools.measurements.test_measurements import TweetCounter,ReTweetCounter

measurement_class_list = [TweetCounter,ReTweetCounter]

We can the build time series from the Tweets in my_enriched_tweets.json as follows:

[TEST] $ cat my_enriched_tweets.json | tweet_time_series_builder.py -c my_measurements.py > time_series.csv

(Note that none of the enrichments we added in the previous section are required to build the specified time series.)

To construct a time series for each observed hashtag, we can define a class locally that inherits key functionality from classes in measurement_base.py:

from gnip_analysis_tools.measurements.measurement_base import Counters

class HashtagCounters(Counters):
    def update(self,tweet):
        for item in tweet['twitter_entities']['hashtags']:
        # put a # in from of the term,
        # since they've been removed in the payload
        self.counters['#'+item['text']] += 1

measurement_class_list = [ HashtagCounters ]

See measurement_base.py for a full description of how to create custom measurement classes.

gnip-analysis-tools's People

Contributors

fionapigott avatar jeffakolb avatar jrmontag avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gnip-analysis-tools's Issues

Expand image module build script for conda

The current bash script that preps that python libraries checks that the user is in a virtualenv environment (by looking for an appropriate shell environment variable) e.g. here. This could be made more general by expanding this check to an env that was build by either virtualenv or conda. A quick check suggested that there may be one (or more) relevant environment variables that might serve this purpose. As a start, this addition could still use pip to install the relevant libraries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.