This repository contains code for the paper "Training language models for deeper understanding improves brain alignment"
pip install -r requirements.txt
- Download the already preprocessed data here. This data contains fMRI recordings for 8 subjects reading one chapter of Harry Potter. The data been detrended, smoothed, and trimmed to remove the first 20TRs and the last 15TRs. For more information about the data, refer to the paper. We have also provided the precomputed voxel neighborhoods that we have used to compute the searchlight classification accuracies.
- Place it under the data folder in this repository (e.g.
./data/fMRI/
and./data/voxel_neighborhoods
).
Below, we provide instructions for how to run the various experiments and code that we use in our paper.
- The sections below are ordered to match the flow of the paper as closely as possible.
- In our paper, we run a large number of experiments (many models, layers, sequence lengths, subjects, discourse features, brain ROIs, etc). Hence, we provide scripts to automate the process of running experiments across the various models, layers, etc. Hopefully, this will make it as easy as possible for others to use our code efficiently.
- Change the variables in this main script and run it:
all_scripts/scripts/outer_nlp_extract.sh
- The main script calls this inner script:
all_scripts/scripts/inner_nlp_extract.sh
- The inner script calls this python file:
extract_nlp_features.py
- The output will be generated in:
1-nlp_features/
- Change the variables in this main script and run it:
all_scripts/scripts/outer_encoding.sh
- The main script calls this inner script:
all_scripts/scripts/inner_encoding.sh
- The inner script calls this python file:
predict_brain_from_nlp.py
- The output will be generated in:
2-encoding_predictions/
- The script described in the section above also performs the 20v20 evaluation:
all_scripts/scripts/outer_encoding.sh
- The main script calls this inner script:
all_scripts/scripts/inner_encoding.sh
- The inner script calls this python file:
evaluate_brain_predictions.py
- The output will be generated in:
3-eval-results/
- Change the variables in this main script and run it:
all_scripts/scripts/outer_perplexity.sh
- The main script calls this inner script:
all_scripts/scripts/inner_perplexity.sh
- The inner script calls this python file:
calculate_perplexity.py
- The output will be generated in:
4-perplexity-results/
- Change the variables in this main script and run it:
all_scripts/scripts/outer_finetune_booksum.sh
- The main script calls this python file:
finetune_booksum.py
- The output will be generated in:
5-finetune-booksum/
- Change the variables in this main script and run it:
all_scripts/scripts/outer_rouge.sh
- The main script calls this python file:
compute_rouge.py
- The output will be generated in:
6-rouge-score/
- First, we need to label the words in the Harry Potter text with their discourse features
- Download it from: http://www.cs.cmu.edu/afs/cs/project/theo-73/www/plosone/
- Place it at
data/story_features.mat
- Next, run
align_story_feature_TRs.ipynb
to map the labeled words to fMRI TRs - Finally, extract the TRs corresponding to each discourse feature, and compute the Pearson correlation score for the discourse feature
- Use the python file:
all_scripts/plot_pearson.py
- The output will be generated in:
7-pearson-saved/
- Use the python file:
- Use the python file:
all_scripts/plot_discourse_and_RoI.py
- The output will be generated in:
8-RoI-and-pearson-saved/
- Change the variables in this main script and run it:
all_scripts/scripts/outer_generate_voxel_values.sh
- The main script calls this python file:
all_scripts/generate_voxel_values.py
- The output will be generated in:
9-pearson-voxels-for-brain-plot/
- To plot the key figures used for paper, see:
all_scripts/figures_paper/
- To plot other figures for visualization, see:
all_scripts/plot_{}
- To run significance tests and false discovery rate (FDR) correction using the Benjamini–Hochberg (BH) procedure, see:
compute_stat_significance.ipynb
- Our repository uses code from the following GitHub repository