coffeine-labs / coffeine Goto Github PK

Covariance Data Frames for Predictive M/EEG Pipelines

Home Page: https://coffeine-labs.github.io/coffeine/

License: MIT License

Python 100.00%

coffeine's Introduction

Covariance Data Frames for Predictive M/EEG Pipelines

Coffeine is designed for building biomedical prediction models from M/EEG signals. The library provides a high-level interface facilitating the use of M/EEG covariance matrix as representation of the signal. The methods implemented here make use of tools and concepts implemented in PyRiemann. The API is fully compatible with scikit-learn and naturally integrates with MNE.

import mne
from coffeine import compute_coffeine, make_filter_bank_regressor

# load EEG data from linguistic experiment
eeg_fname = mne.datasets.kiloword.data_path() / "kword_metadata-epo.fif"
epochs = mne.read_epochs(eeg_fname)[:50]  # 50 samples

# compute covariances in different frequency bands 
X_df, feature_info = compute_coffeine(  # (defined by IPEG consortium)
    epochs, frequencies=('ipeg', ('delta', 'theta', 'alpha1'))
)  # ... and put results in a pandas DataFrame.
y = epochs.metadata["WordFrequency"]  # regression target

# compose a pipeline
model = make_filter_bank_regressor(method='riemann', names=X_df.columns)
model.fit(X_df, y)

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 3, figsize=(8, 3))
for ii, name in enumerate(('delta', 'theta', 'alpha1')):
    axes[ii].matshow(X_df[name].mean(), cmap='PuOr')
    axes[ii].set_title(name)

Background

For this purpose, coffeine uses DataFrames to handle multiple covariance matrices alongside scalar features. Vectorization and model composition functions are provided that handle composition of valid scikit-learn modeling pipelines from covariances alongside other types of features as inputs.

The filter-bank pipelines (e.g. across multiple frequency bands or conditions) can the thought of as follows:

M/EEG covariance-based modeling pipeline from Sabbagh et al. 2020, NeuroImage

After preprocessing, covariance matrices can be projected to a subspace by spatial filtering to mitigate field spread and deal with rank deficient signals. Subsequently, vectorization is performed to extract column features from the variance, covariance or both. Every path combnining different lines in the graph describes one particular prediction model. The Riemannian embedding is special in mitigating field spread and providing vectorization in 1 step. It can be combined with dimensionality reduction in the projection step to deal with rank deficiency. Finally, a statistical learning algorithm can be applied.

The representation, projection and vectorization steps are separately done for each frequency band (or condition).

Installation of Python package

You can clone this library, and then do:

$ pip install -e .

Everything worked if the following command do not return any error:

$ python -c 'import coffeine'

Citation

When publishing research using coffeine, please cite our core paper.

@article{sabbagh2020predictive,
  title={Predictive regression modeling with MEG/EEG: from source power to signals and cognitive states},
  author={Sabbagh, David and Ablin, Pierre and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Engemann, Denis A},
  journal={NeuroImage},
  volume={222},
  pages={116893},
  year={2020},
  publisher={Elsevier}
}

Please cite additional references highlighted in the documentation of specific functions and tutorials when using these functions and examples.

Please also cite the upstream software this package is building on, in particular PyRiemann.

coffeine's People

Contributors

Stargazers

Watchers

Forkers

tomdamelio agramfort cjayb hubertjb dengemann apmellot sjg2203 jcathalina kchardon antoinecollas singaub ad045

coffeine's Issues

BUG: the default handling of frequency bands / names is buggy.

I learned through @tomasdamelio that we have a problem in the way we initialise the names / frequency bands https://github.com/DavidSabbagh/meegpowreg/blob/8aed07d9af5697bde97ed28ef3864db002d12e60/meegpowreg/power_features.py#L123

If "alpha" is not among the names it will nevertheless be included. I actually think we should remove the default value.

API: compute_features should be more flexible

What features are computed should be up to the user. If someone only wants covariances, it is silly to compute all the other features too.

ENH: Adding support for Polars

Polars is written in Rust (fast + lazy eval) and looks like an interesting option to consider when it comes to scaling up our object DataFrames.

https://pola-rs.github.io/polars-book/user-guide/introduction.html

Opening this issues to bookmark the idea for later.

ENH: add CSP option and use as default when doing spatial filters for classification

Currently one can do things like SPoC and then do classification with it. It works but is probably slightly ugly. I think we should add CSP here.

DOC: doc strings and examples are missing

The title says it all.
Before addressing this, the API must consolidate. Don't work on this yet.

[ENH] add frequency band selector function to load different frequency band conventions

Something like

get_frequency_bands(kind='ipeg') -> dict

to get different band definitions (currently it's HPC MEG).

API/ENH: support sub-selecting covariance columns

Currently it does not seem to work to have a data frame with e.g., alpha, beta, gamma and only run the filter bank regressor on the alpha column while ignoring the other covariance columns. cf @DavidSabbagh

ENH: add more unit tests

It seems coverage is very low, around 50% with #19 ... @agramfort @DavidSabbagh @apmellot we need to do better

AP

ENH: make arithmetic mean for whitening in SPoC optional

The PyRiemann version of SPoC uses a Riemannian mean for computing the reference in SPoC. Our version uses an arithmetic mean. It should be optional.

API: return value of compute_features

When computing covariances, we return a 3d array where the first element is the frequency band. But we don't have broken the link with the frequency_band parameter and make it unnecessarily difficult to build our fancy data frame later.
Should we return a dict of covariances or even a data frame?

API: return value of make_pipelines

I find it somewhat heavy that all possible pipelines are computed, even if only one is used.
I think it would be nicer to have something like a "pipeline" keyword and do if/else blocks and just return the pipeline asked for. In the process we could think of renaming the function too (still thinking of a good name).

riemann = make_pipelines(fb_cols=fbands.keys(), pipeline = 'riemann')

ENH: apply post-scaling to enable handling SSS rank when using grad and mag

Currently the pipeline assumes that one sensor type is passed.
In practice, SSS is applied on both magnetometers and gradiometers and we should not require users to pick one of them.
The solution should be to apply post scaling inside the functions computing covariances (and other features) from epochs.
With post-scaling applied based on the MNE defaults, the ProjCommonSpace should be able to handle everything.
If SSS is not applied, different options may need to be elaborated.

[ENH/API] have a feature computing function that returns the covariance data frame

The idea is to hide the construction of cov data frames from the user.

Say:

cov_df1 = compute_coffeine(inst=raw1, method='frequency_bands', method_params=dict(bands='ipeg'))

cov_df2 = compute_coffeine(inst=raw2, method='frequency_bands', method_params=dict(bands='ipeg'))

cov_df = pd.concat([cov_df1, cov_df2])

[ENH]: investigate more reasonable default for `reg` param in `ProjCommon`

@apmellot has made the discovery that the strange necessity to scale by trace of cov can be avoided when reducing the amount of shrinkage via reg in ProjCommon. This makes intuitively sense and I think we should simply not shrink by default, especially, as we commonly used regularized covariance estimates to begin with and dimensionality reduction has a shrinking effect.

FYI @DavidSabbagh @agramfort

ENH/FIX: more canonical naming patterns

I find the names used for the modules and some of the functions not ideal.

Here is a non-exhaustive list:

featuring.py -> covariance_transformers.y / transformers.py
spfiltering.py -> spatial_filtering.py
power_features -> covariance_features.py / spectral_features.py / spectral.py / cov.py

I think at the level of function names there were a few more issues.

ENH: do small website once examples are there

[ENH/API]: add support for kernel regressors

Add example / option for log kernel from [https://arxiv.org/abs/2303.05798].

ENH/DOC: roadmap

Here are my thoughts for the roadmap of this project.

I see this becoming a library to implement various clinically relevant prediction models using different types of data, that is data of different shape and data due to different generative mechanisms.

This not only flexibly allows implementing the models from our reference papers, but also future models.

What I have in mind is a benchmark / tutorial paper which showcases the following use cases:

M/EEG covariance-based prediction (Riemann, Spatial filters): Sabbagh et al 2020
Stacking models: Engemann et al 2020
Interaction effects in first-layer learners (e.g. linear models): e.g. ridge regression with M/EEG covariance but slopes differing by gender
Passing through stand-alone variables to second-layer learners (random forests) : e.g. pass gender indicator to random forest
flexible combinations of 1-4

@DavidSabbagh @agramfort

[ENH/API]: example / support for applying recentering / scaling / transfer in CV-supportive fashion for multiple domains

title says it all

ENH: do first release / make pypi package

title says it all

API: name of the repo

I am not happy with the name of the repo.
Let's spell our what the library does and use acronymify or so to find a name that can be remembered easily.

BUG: Deprecated MNE functions

Hi,

I've noticed that your package does not work with new version of MNE, as the mne.time_frequency.psd_welch was replaced a while back by mne.compute_psd(method='welch') as shown here.

Could you update your package to use new MNE functions instead?
Thank you

[DOC/API] make sure the default baselines do not induce systematic differences cov processing

Currently, 'riemann' applies regularization by default, SPoC applies another regularization, 'log_diag' applies no regularization. This can lead to differences in performance driven by input noise and not by the intrinsic quality of the estimators. Needs a careful fix.

API: expose covariance rank options

This may be needed for better benchmarking.

API: generalising expander class

Currently, the expander code rigidly assumes that the last column in the data frame is the indicator for computing interaction effects.

This is problematic in at least two ways.

It is very implicit; a label would be better.
It is only one out of many possible ways of doing feature expansions.

I suggest we generalise the API to allow using different (future) possible expander objects that can be passed instead of using a logical parameter.

FIX: install is requiring sklearn 0.23

Doing a python setup.py develop makes me install an older scikit learn version, which is not nice.
Is this really necessary or can we change it to scikit-learn>=0.23.2 ? cc @DavidSabbagh

ENH: ExpandFeatures tests

The interaction effect expansion is not nested. Next to #5, this should be fixed too.
I think it would be good to test this with small toy data against the equivalent model produced by patsy or R formula.

coffeine-labs / coffeine Goto Github PK

coffeine's Introduction

Covariance Data Frames for Predictive M/EEG Pipelines

Background

Installation of Python package

Citation

coffeine's People

Contributors

Stargazers

Watchers

Forkers

coffeine's Issues

Recommend Projects

Recommend Topics

Recommend Org