Giter Club home page Giter Club logo

brain_language_deeper's Introduction

Training language models for deeper understanding improves brain alignment

This repository contains code for the paper "Training language models for deeper understanding improves brain alignment"

Setup

Install packages

  • pip install -r requirements.txt

Data - fMRI Recordings of 8 Subjects Reading Harry Potter

  • Download the already preprocessed data here. This data contains fMRI recordings for 8 subjects reading one chapter of Harry Potter. The data been detrended, smoothed, and trimmed to remove the first 20TRs and the last 15TRs. For more information about the data, refer to the paper. We have also provided the precomputed voxel neighborhoods that we have used to compute the searchlight classification accuracies.
  • Place it under the data folder in this repository (e.g. ./data/fMRI/ and ./data/voxel_neighborhoods).

Running Code

Below, we provide instructions for how to run the various experiments and code that we use in our paper.

  • The sections below are ordered to match the flow of the paper as closely as possible.
  • In our paper, we run a large number of experiments (many models, layers, sequence lengths, subjects, discourse features, brain ROIs, etc). Hence, we provide scripts to automate the process of running experiments across the various models, layers, etc. Hopefully, this will make it as easy as possible for others to use our code efficiently.

1. Extract NLP representations

  • Change the variables in this main script and run it: all_scripts/scripts/outer_nlp_extract.sh
  • The main script calls this inner script: all_scripts/scripts/inner_nlp_extract.sh
  • The inner script calls this python file: extract_nlp_features.py
  • The output will be generated in: 1-nlp_features/

2. Align NLP representations to human brain activity (i.e. linear encoding process)

  • Change the variables in this main script and run it: all_scripts/scripts/outer_encoding.sh
  • The main script calls this inner script: all_scripts/scripts/inner_encoding.sh
  • The inner script calls this python file: predict_brain_from_nlp.py
  • The output will be generated in: 2-encoding_predictions/

3. Evaluate brain-NLP alignment using 20v20 classification accuracy

  • The script described in the section above also performs the 20v20 evaluation: all_scripts/scripts/outer_encoding.sh
  • The main script calls this inner script: all_scripts/scripts/inner_encoding.sh
  • The inner script calls this python file: evaluate_brain_predictions.py
  • The output will be generated in: 3-eval-results/

4. Compute Language modeling ability (i.e. perplexity or cross-entropy loss)

  • Change the variables in this main script and run it: all_scripts/scripts/outer_perplexity.sh
  • The main script calls this inner script: all_scripts/scripts/inner_perplexity.sh
  • The inner script calls this python file: calculate_perplexity.py
  • The output will be generated in: 4-perplexity-results/

5. Train language models on BookSum (not used for paper)

  • Change the variables in this main script and run it: all_scripts/scripts/outer_finetune_booksum.sh
  • The main script calls this python file: finetune_booksum.py
  • The output will be generated in: 5-finetune-booksum/

6. Compute performance of models on BookSum dataset (i.e. ROUGE score) (not used for paper)

  • Change the variables in this main script and run it: all_scripts/scripts/outer_rouge.sh
  • The main script calls this python file: compute_rouge.py
  • The output will be generated in: 6-rouge-score/

7. Interpretability method to compute Pearson correlation for various discourse features

  • First, we need to label the words in the Harry Potter text with their discourse features
  • Next, run align_story_feature_TRs.ipynb to map the labeled words to fMRI TRs
  • Finally, extract the TRs corresponding to each discourse feature, and compute the Pearson correlation score for the discourse feature
    • Use the python file: all_scripts/plot_pearson.py
    • The output will be generated in: 7-pearson-saved/

8. Compute Pearson correlation for each pair of (discourse feature, brain ROI) (not used for paper)

  • Use the python file: all_scripts/plot_discourse_and_RoI.py
  • The output will be generated in: 8-RoI-and-pearson-saved/

9. Generate brain voxel values for visualizing on brain plots

  • Change the variables in this main script and run it: all_scripts/scripts/outer_generate_voxel_values.sh
  • The main script calls this python file: all_scripts/generate_voxel_values.py
  • The output will be generated in: 9-pearson-voxels-for-brain-plot/

Others

  • To plot the key figures used for paper, see: all_scripts/figures_paper/
  • To plot other figures for visualization, see: all_scripts/plot_{}
  • To run significance tests and false discovery rate (FDR) correction using the Benjamini–Hochberg (BH) procedure, see: compute_stat_significance.ipynb
  • Our repository uses code from the following GitHub repository

brain_language_deeper's People

Contributors

brain-nlp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.