Giter Club home page Giter Club logo

depsenso_nlp's Introduction

DepSenSo_nlp

Natural Language Processing part of the depsenso project

Preprocessing

demojize_text.py

Libraries: emoji
Converts all emojis in the passed string in text format (i.e. :emoji_name:) This allows us to transform emojies into key words that are usefull for sentiment analysis. For example, "\U0001f642" is ๐Ÿ™‚

tokenizer_userMentionsCount.py

Libraries: nltk, sys, unicodedata and string
Tokenizes text using TweetTokenizer (from nltk). In order to prepare the text for further analysis, we remove user handles (using the format of reddit or twitter), capitalization, hyperlinks and punctuation. Additionaly, tweet tokenizer normalizes long-words by maximizing the number of reapeated characters to 3 ("lloooooonnnnnnng" -> "llooonnng") and we counts every distinct user_mention to prepare for the network analysis of the user.

lemmatize_wordnet.py

Libraries: nltk
Lematizes a list of tokens using the WordNetLemmatizer. The correct lemma is chosen using the pos tag of each token.

Training

train_model_lda_tweetdata.py

One time use code to train our lda classifier using the twitter data available at https://github.com/AshwanthRamji/Depression-Sentiment-Analysis-with-Twitter-Data This code isn't to be used on the actuall application DepSenSo

Topic Modeling

topic_modeling_lda.py

Libraries: gensim
Using the trained classifier find the most likely topics from a list of lemmatized tokens. The result is a python dictionary in the form {topic : probability}.

topic_modeling_empath.py

Libraries: empath
Returns the repartition of lexical categories, similarly to LIWC. For the purpose of DepSenSo we define three custom categories: Depression, Mental_health and Anxiety.

Sentiment Analysis

We determine sentiment polarity as 0 meaning neutral, greater than 0 positive and lesser than 0 negative.

sentiment_analysis_classifier.py

Libraries: json, pandas and slearn
After creating a naive_bayes classifier using the twitter data available at https://github.com/AshwanthRamji/Depression-Sentiment-Analysis-with-Twitter-Data. We can calsculate the polarity of a string or a list of strings.

sentiment_analysis_sentiwordnet.py

Libraries: nltk and lemmatize_wordnet.py
Using sentiWordnet, we calculate the polarity of a text as the sum of the polarity of every lematized token.

Data Collection

At present we are able to get data from reddit.com and twitter.com. Both site use OAuth2 for identification of third party application using their data. As such we need accounts for both sites.

get_tweets_from_specified_account.py

Libraries: tweetpy and datetime
Import Twitter data from a specified accound. After connecting to twitter using out application credential we can pull the complette history of a twitter user provided we know his username.

get_reddit_from_specified_account.py

Libraries: praw
Import reddit data from a specified accound. After connecting to reddit using out application credential we can pull the last 1000 submitions and comments of a reddit user provided we know his username. The 1000 limit is hardcoded in the praw library.

depsenso_nlp's People

Contributors

nzotto avatar

Watchers

James Cloos avatar Kennedy Opoku Asare avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.