Giter Club home page Giter Club logo

data-mining-suicide-sg's Introduction

Data Mining Approach to the Detection of Suicide in Social Media: A Case Study of Singapore

IS470 Guided Research Project

In this research, we focus on the social phenomenon of suicide. Specifically, we perform social sensing on digital traces obtained from Reddit. We analyze the posts and comments in that are related to suicide. We perform natural language processing to better understand different aspects of human life that relate to suicide.

The poster paper for this project was accepted to the 2018 IEEE International Conference on Big Data.

Data

The dataset consists of 406 submissions dating from July 2010 to November 2018 - of which 161 were deemed relevant to suicide and depression - and 9,010 comments (responding to the 161 submissions), that were crawled from https://www.reddit.com/r/singapore/ using the PRAW API. Queries were informed by previous work in "Tracking suicide risk factors through Twitter in the US" by J. Jashinsky, S.H. Burton, C.L. Hanson, J. West, C. Giraud-Carrier, M.D. Barnes, and T. Argyle (2013).

All data was saved to CSV format and can be found in the data folder included in this repository.

Sample

title score id url comms_num created body author_name query
0 Update from the Depressed Asshole 59 8vgrob https://www.reddit.com/r/singapore/comments/8v... 30 2018-07-02 23:26:04 About 3-4 months ago, I posted on reddit about... GramTooNoob feel alone depressed
1 Friends, family, acquaintances of someone who ... 20 5scrdi https://www.reddit.com/r/singapore/comments/5s... 12 2017-02-06 22:45:00 How did you feel then, and how has it affected... depressings friend suicide
... ... ... ... ... ... ... ... ... ...

Setup

pip install -r requirements.txt

This repository was written in Python 3.7.0 and uses the following libraries heavily:

Scripts

The crawl_reddit.py script can:

  1. --s: crawl submissions from the Singapore subreddit with terms from a vocabulary file,
  2. --l: allow data labelling, and/or
  3. --c: crawl for comments on the submissions labeled "relevant"
python crawl_reddit.py [optional flags]

The preprocess.py script prepares the data for topic modeling using Gensim or Scikit-learn

python preprocess.py --gensim --sklearn

Visualization

Both Gensim and Scikit-learn implementations of LDA topic modeling can be run from their respective notebooks. The output plots from pyLDAvis can be found in the plots folder.

Work-in-Progress

Acknowledgements

urlmarker.py written by @gruber was used to Regex match urls during data cleaning.

The malletmodel2ldamodel() method provided by Roger Mähler is an improvement on Gensim's implementation for converting ldamallet to ldamodel.

Many thanks to Dr Kyong Jin Shim for her superb support and guidance.

data-mining-suicide-sg's People

Contributors

dependabot[bot] avatar shingkid avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.