Giter Club home page Giter Club logo

coffeine's Introduction

Covariance Data Frames for Predictive M/EEG Pipelines

Build

Coffeine is designed for building biomedical prediction models from M/EEG signals. The library provides a high-level interface facilitating the use of M/EEG covariance matrix as representation of the signal. The methods implemented here make use of tools and concepts implemented in PyRiemann. The API is fully compatible with scikit-learn and naturally integrates with MNE.

import mne
from coffeine import compute_coffeine, make_filter_bank_regressor

# load EEG data from linguistic experiment
eeg_fname = mne.datasets.kiloword.data_path() / "kword_metadata-epo.fif"
epochs = mne.read_epochs(eeg_fname)[:50]  # 50 samples

# compute covariances in different frequency bands 
X_df, feature_info = compute_coffeine(  # (defined by IPEG consortium)
    epochs, frequencies=('ipeg', ('delta', 'theta', 'alpha1'))
)  # ... and put results in a pandas DataFrame.
y = epochs.metadata["WordFrequency"]  # regression target

# compose a pipeline
model = make_filter_bank_regressor(method='riemann', names=X_df.columns)
model.fit(X_df, y)
image
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(8, 3))
for ii, name in enumerate(('delta', 'theta', 'alpha1')):
    axes[ii].matshow(X_df[name].mean(), cmap='PuOr')
    axes[ii].set_title(name)
image

Background

For this purpose, coffeine uses DataFrames to handle multiple covariance matrices alongside scalar features. Vectorization and model composition functions are provided that handle composition of valid scikit-learn modeling pipelines from covariances alongside other types of features as inputs.

The filter-bank pipelines (e.g. across multiple frequency bands or conditions) can the thought of as follows:

M/EEG covariance-based modeling pipeline from Sabbagh et al. 2020, NeuroImage

After preprocessing, covariance matrices can be projected to a subspace by spatial filtering to mitigate field spread and deal with rank deficient signals. Subsequently, vectorization is performed to extract column features from the variance, covariance or both. Every path combnining different lines in the graph describes one particular prediction model. The Riemannian embedding is special in mitigating field spread and providing vectorization in 1 step. It can be combined with dimensionality reduction in the projection step to deal with rank deficiency. Finally, a statistical learning algorithm can be applied.

The representation, projection and vectorization steps are separately done for each frequency band (or condition).

Installation of Python package

You can clone this library, and then do:

$ pip install -e .

Everything worked if the following command do not return any error:

$ python -c 'import coffeine'

Citation

When publishing research using coffeine, please cite our core paper.

@article{sabbagh2020predictive,
  title={Predictive regression modeling with MEG/EEG: from source power to signals and cognitive states},
  author={Sabbagh, David and Ablin, Pierre and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Engemann, Denis A},
  journal={NeuroImage},
  volume={222},
  pages={116893},
  year={2020},
  publisher={Elsevier}
}

Please cite additional references highlighted in the documentation of specific functions and tutorials when using these functions and examples.

Please also cite the upstream software this package is building on, in particular PyRiemann.

coffeine's People

Contributors

agramfort avatar antoinecollas avatar apmellot avatar davidsabbagh avatar dengemann avatar hubertjb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

coffeine's Issues

API: return value of compute_features

When computing covariances, we return a 3d array where the first element is the frequency band. But we don't have broken the link with the frequency_band parameter and make it unnecessarily difficult to build our fancy data frame later.
Should we return a dict of covariances or even a data frame?

API: return value of make_pipelines

I find it somewhat heavy that all possible pipelines are computed, even if only one is used.
I think it would be nicer to have something like a "pipeline" keyword and do if/else blocks and just return the pipeline asked for. In the process we could think of renaming the function too (still thinking of a good name).

riemann = make_pipelines(fb_cols=fbands.keys(), pipeline = 'riemann')

ENH: apply post-scaling to enable handling SSS rank when using grad and mag

Currently the pipeline assumes that one sensor type is passed.
In practice, SSS is applied on both magnetometers and gradiometers and we should not require users to pick one of them.
The solution should be to apply post scaling inside the functions computing covariances (and other features) from epochs.
With post-scaling applied based on the MNE defaults, the ProjCommonSpace should be able to handle everything.
If SSS is not applied, different options may need to be elaborated.

ENH/FIX: more canonical naming patterns

I find the names used for the modules and some of the functions not ideal.

Here is a non-exhaustive list:

  • featuring.py -> covariance_transformers.y / transformers.py
  • spfiltering.py -> spatial_filtering.py
  • power_features -> covariance_features.py / spectral_features.py / spectral.py / cov.py

I think at the level of function names there were a few more issues.

ENH/DOC: roadmap

Here are my thoughts for the roadmap of this project.

I see this becoming a library to implement various clinically relevant prediction models using different types of data, that is data of different shape and data due to different generative mechanisms.

This not only flexibly allows implementing the models from our reference papers, but also future models.

What I have in mind is a benchmark / tutorial paper which showcases the following use cases:

  1. M/EEG covariance-based prediction (Riemann, Spatial filters): Sabbagh et al 2020
  2. Stacking models: Engemann et al 2020
  3. Interaction effects in first-layer learners (e.g. linear models): e.g. ridge regression with M/EEG covariance but slopes differing by gender
  4. Passing through stand-alone variables to second-layer learners (random forests) : e.g. pass gender indicator to random forest
  5. flexible combinations of 1-4

@DavidSabbagh @agramfort

API: name of the repo

I am not happy with the name of the repo.
Let's spell our what the library does and use acronymify or so to find a name that can be remembered easily.

BUG: Deprecated MNE functions

Hi,

I've noticed that your package does not work with new version of MNE, as the mne.time_frequency.psd_welch was replaced a while back by mne.compute_psd(method='welch') as shown here.

Could you update your package to use new MNE functions instead?
Thank you

API: generalising expander class

Currently, the expander code rigidly assumes that the last column in the data frame is the indicator for computing interaction effects.

This is problematic in at least two ways.

  1. It is very implicit; a label would be better.
  2. It is only one out of many possible ways of doing feature expansions.

I suggest we generalise the API to allow using different (future) possible expander objects that can be passed instead of using a logical parameter.

ENH: ExpandFeatures tests

The interaction effect expansion is not nested. Next to #5, this should be fixed too.
I think it would be good to test this with small toy data against the equivalent model produced by patsy or R formula.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.