Contains the data processing of my Master Thesis and example notebooks for further reference
Data should be added to a Data folder, add the path to the datapath variable in preprocessing_notebook.py. Data should be in .csv and have the following columns (see also dummy_data.csv):
- discussion_id
- topic
- post_id
- text
- username
- parent_post_id
- parent_missing
After adding the data, the following notebooks can be used step by step to investigate the interplay between sentiment and alignment:
- Preprocessing => preprocessing.py
- Alignment analysis => compute_lexical_word_alignment_v2_all.py
- Sentiment analysis => sentiment_analysis.py
- Interplay analysis => interplay_analysis.py
Cleaning up the code is still a work in progress, but will appear here:
- Preprocessing => preprocessing_notebook.py
- Alignment analysis => lexical_alignment_notebook.py
- Sentiment analysis => sentiment_notebook.py
- Interplay analysis => interplay_alignment_sentiment_notebook.py
The code for the time-based-overlap lives in linguistic_alignment_analysis/time_based_overlap.py
This repo needs the following packages (including their dependencies):
- matplotlib
- nltk
- numba
- numpy
- pandas
- stanza
- scipy
- transformers