Giter Club home page Giter Club logo

shashank-srikant / braincode Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benlipkin/braincode

0.0 1.0 0.0 820.58 MB

An analysis of representations of computer programs learned by ML models and those seen in our brains

License: MIT License

Python 96.34% Shell 1.30% Dockerfile 0.26% Makefile 2.09%
cognitive-neuroscience deep-learning fmri-data-analysis language-models language-understanding programming-languages representation-learning python3

braincode's Introduction

Tests

BrainCode

Project investigating human and artificial neural representations of code.

This branch is currently under development, and should be considered unstable. To replicate specific papers, git checkout the corresponding branch, e.g., NeurIPS2022, and follow instructions in the README.md.

This pipeline supports several major functions.

  • MVPA (multivariate pattern analysis) evaluates decoding of code properties or code model representations from their respective brain representations within a collection of canonical brain regions.
  • RSA (representational similarity analysis) is also supported as an alternative to MVPA.
  • VWEA (voxel-wise encoding analysis) evaluates prediction of voxel-level activation patterns using code properties and code model representations as features.
  • NLEA (network-level encoding analysis) uses the same features to evaluate encoding of mean network-level activation strength.
  • PRDA (program representation decoding analysis) evaluates decoding of code properties from code model representations.
  • PREA (program representation encoding analysis) evaluates encoding of code model representations using the set of code properties explored in this work.

Note: VWEA and NLEA also support ceiling estimates at the network level, calculated via an identical pipeline but with the features being the representations of other participants to the same stimuli rather than the properties extracted from those stimuli. To invoke a ceiling analysis, prefix the requested analysis type with a "C", e.g., CNLEA.

Supported Brain Regions

  • brain-md_lh (Multiple Demand Network: Left Hemisphere)
  • brain-md_rh (Multiple Demand Network: Right Hemisphere)
  • brain-lang_lh (Language Network: Left Hemisphere)
  • brain-lang_rh (Language Network: Right Hemisphere)

Supported Code Features

Code Properties

  • task-structure (seq vs. for vs. if) *ControlFlow
  • task-content (math vs. str) *DataType
  • task-nodes (# of nodes in AST) *ASTNodes
  • task-lines (# of runtime steps during execution) *LinesExecuted

Code Models

Baseline:

  • code-tokens (arbitrary projection encoding presence of individual tokens)

LLM Suite (CodeGen1):

  • code-llm_350m_nl
  • code-llm_2b_nl
  • code-llm_6b_nl
  • code-llm_16b_nl
  • code-llm_350m_mono
  • code-llm_2b_mono
  • code-llm_6b_mono
  • code-llm_16b_mono

Note: checkpoints vary in size and pre-training (nl—ThePile; mono—ThePile+BigQuery+BigPython)

Installation

Requirements: Anaconda, GNU Make

git clone --branch main --depth 1 https://github.com/benlipkin/braincode
cd braincode
make setup

Run

usage: __main__.py [-h] [-f FEATURE] [-t TARGET] [-m METRIC] [-d CODE_MODEL_DIM] [-p BASE_PATH] [-s] [-b] {mvpa,rsa,vwea,nlea,cvwea,cnlea,prda,prea}

run specified analysis type

positional arguments:
  {mvpa,rsa,vwea,nlea,cvwea,cnlea,prda,prea}

optional arguments:
  -h, --help            show this help message and exit
  -f FEATURE, --feature FEATURE
  -t TARGET, --target TARGET
  -m METRIC, --metric METRIC
  -d CODE_MODEL_DIM, --code_model_dim CODE_MODEL_DIM
  -p BASE_PATH, --base_path BASE_PATH
  -s, --score_only
  -b, --debug

Note: BASE_PATH must be specified to match setup.sh if changed from default.

Sample calls

# basic examples
python -m braincode mvpa -f brain-md_lh -t task-structure # brain -> {task, model}
python -m braincode rsa -f brain-lang_lh -t code-llm_2b_nl # brain <-> {task, model}
python -m braincode vwea -f brain-md_rh -t code-tokens # brain <- {task, model}
python -m braincode nlea -f brain-lang_rh -t task-content # brain <- {task, model}
python -m braincode prda -f code-llm_350m_mono -t task-lines # model -> task
python -m braincode prea -f code-tokens -f task-content # model <- task

# more complex examples
python -m braincode cnlea -f all -m SpearmanRho --score_only # check metrics module for all options
python -m braincode mvpa -f brain-lang_lh+brain-lang_rh -t code-tokens -d 64 -p $BASE_PATH
python -m braincode vwea -t task-content+task-structure+task-nodes+task-lines
# note how `+` operator can be used to join multiple representations via concatenation

Citation

If you use this work, please cite XXX (under review)

License

License: MIT

braincode's People

Contributors

benlipkin avatar shashank-srikant avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.