braindecode / braindecode Goto Github PK

Deep learning software to decode EEG, ECG or MEG signals

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

deep-learning eeg electroencephalogram electroencephalography neuroimaging neuroscience python pytorch meg magnetoencephalography electrocorticography ecog

braindecode's Issues

Shuffle train iterator by default

Skorch does not do that, see skorch-dev/skorch#419 , we should do it by default, much more intuitive.

Ensure small training batches are dropped

Our training loop should behave like setting drop_last to True here in dataloader https://pytorch.org/docs/stable/data.html

CircleCI should not redownload data all the time

For unclear reasons, CircleCI sometimes downloads the physionet data multiple times (https://circleci.com/gh/braindecode/braindecode/153 2 times, https://circleci.com/gh/braindecode/braindecode/108 3 times).

And it puts them in exactly the same locations, e.g.:

First download returns (among other names):
'/home/circleci/mne_data/MNE-eegbci-data/physiobank/database/eegmmidb/S001/S001R04.edf'
Second download returns (among other names):
'/home/circleci/mne_data/MNE-eegbci-data/physiobank/database/eegmmidb/S001/S001R04.edf'

Other times it seems to have it cached/not download it, https://circleci.com/gh/braindecode/braindecode/147

I don't understand the behavior and how to make it work properly. @agramfort

Analysing the performance of different methods to get windows

I've started looking at the performance of various ways of getting windows:
1- MNE: epochs.get_data(ind)[0] with lazy loading (preload=False)
2- MNE: epochs.get_data(ind)[0] with eager loading (preload=True)
3- MNE: direct access to the internal numpy array with epochs._data[index] (requires eager loading)
4- HDF5: using h5py (lazy loading)

The script that I used to run the comparison is here:
https://github.com/hubertjb/braindecode/blob/profiling-mne-epochs/test/others/profiling_mne_epochs.py
Also, I ran the comparison on a single CPU using:
>>> taskset -c 0 python profiling_mne_epochs.py

Here's the resulting figure, where the x-axis is the number of time samples in the continuous recording:

For the moment, it looks like:
1- ._data[index] is unsurprisingly the fastest, however it requires to load the entire data into memory.
2- hdf5 is very close, with around 0.5 ms per loop, which is great knowing it's able to only load one window at a time.
3- get_data(index) is much slower, but this is expected as we know it creates a new mne.Epochs object every time it's called. Also, the gap between preload=True and preload=False is about 1.5 ms, which might be OK. The main issue though seems to be the linear increase of execution time as the continuous data gets bigger and bigger.

Next steps

Considering the benefits of using MNE for handling the EEG data inside the Dataset classes, I think it would be important to dive deeper into the inner workings of get_data() to see whether simple changes could make this more efficient. I can do some actual profiling on that. What do you think @agramfort @robintibor @gemeinl ?

Note: I haven't included the extraction of labels in this test.

Remove resampy dependecy

The resampy library is used to resample continuous signals, see https://github.com/braindecode/braindecode/blob/master/braindecode/mne_ext/signalproc.py#L34-L72. The dependency should be removed, and resampling should be implemented through transforms (https://github.com/braindecode/braindecode/blob/master/braindecode/datautil/transforms.py#L18-L49) using mne.

Transforms->Preprocessors?

While rewriting and refactoring for new tutorials, I thought about one possible renaming: Transforms->Preprocessors.
The reason is that our current transforms are applied directy to the data, not on-the-fly as torchvision transforms. This renaming would also allow to distinguish to-be-added transforms that are applied on-the-fly like in torchvision. However, don't feel 100% sure about it. What do you think @agramfort @sliwy @gemeinl

Move to new Skorch-based API, remove old Experiment/Model API

Create new package(or not, see below) with name such as engine, trainer(s) or training for skorchbased training/experiment loop api, check bcic_iv_2a_cropped.py (bcic_iv_2a.py already checked) and replace then in other examples as well (also reply amir email)

experiments/experiment.py -> remove
experiments/loggers.py -> remove
experiments/monitors.py -> move code needed for cropped decoding computations out (temporarily to scoring? or other place? maybe cropped.py in new engine/trainer module?), then remove
experiments/stopcriteria.py -> remove
models/base.py -> remove, rewrite existing models to not inherit from it anymore, adapt examples (also grep for create_network in all of braindecode and remove)
torch_ext/losses.py -> remove
torch_ext/schedulers.py -> remove
*datautil/splitters.py -> remove

For name for new package, some ways of others:

Change License author to braindecode developers 2020

Rethink where to put nn inits

Right now, one only used by EEGNet is in nn_init.py another one is inside eegresnet... should both be in nn_init? Or both be in models and private?

Preprocessing / scaling of targets

Some applications might benefit of preprocessing / scaling of targets, e.g. age regression. How could we include this in braindecode? Do we introduce another function next to transform_concat_ds to apply transforms to targets or would you argue for a combination of both?

Make examples look beautiful for docs

Some examples now don't have much formatting or explanation. Should be much nicer, easy to understand for new user...

Different results using skorch and my training function

I was trying to translate my codes to use braindecode. Previously I had a function train that took pytorch neural net and trained model. Now I wanted to replace this function with skorch classifier. I created an example that should use the same network, the same parameters and the same dataset and it gives different results (with my train function it gives 1.0 train accuracy after around 10-15 epoch instead of not even similar results after 50 epochs). On my dataset it does not work at all, so that is why I got interested in the skorch performance.
I'm stuck and I don't have any idea what causes this behavior. If anyone has some time and would like to help me here is a link to the python file with the example, maybe fresh look can help https://github.com/sliwy/braindecode/blob/strange_example/examples/skorch_slow_learning.py

X/y dataset and mne dataset

Clear functionality how/where to enter X/y dataset or mne epochs(?) datasets with tutorial documentation

Translate Amir Example Code

Search your mail for WG: example code and translate once new API is integrated

What's new page

@agramfort you wanted to set something up, right? Similar to this https://mne.tools/dev/whats_new.html

Polished tutorials

Make https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_trial.html and https://braindecode.org/auto_examples/plot_bcic_iv_2a_moabb_cropped.htm properly into two proper tutorials, like similar to https://tntlfreiburg.github.io/braindecode/notebooks/Cropped_Decoding.html

ResNet

Use different, quite successful architecture!

check this repository https://github.com/robintibor/adamw-eeg-eval/blob/master/adamweegeval/resnet.py
see how it can be nicely included

Improve Dataset classes and Windower interaction

1 Re-fine definition of a BaseDataset:

does it inherit from pytorch Dataset?
if it is a pytorch Dataset, length should be n_times
is there a use case for calling getitem?

2 Have a BaseConcatDataset

should implement a 'split' method

3 Datasets should not know the windower

move EventWindower / FixedLengthWindower out of MOABBDataset/TUHAbnormal

4 Windowers should be applied to datasets

Windowers could be functions instead of classes
windowers should accept a ConcatDataset on call and return a ConcatDataset

5 Avoid variable name ambiguities

for example with mne.Raw.info

EpochScoring Callbacks should be always for all sets

Ensure that all epochscoring computed metrics are computed on all sets, maybe in braindecodeClassifier constructor.

Update Readme

Create a more structure in the Readme for https://braindecode.org/.
Should allow beginners directly to understand how to get started using braindecode, "Quickstart"-like, similar to before https://tntlfreiburg.github.io/braindecode/ .

How to calculate input-feature unit-output maps?

Hi! First of all, thank you for your interesting work and for publishing and documenting everything here! :-)

For my master's thesis, I'm trying to classify the attentive state of patients with disorders of consciousness. The major challenge is that there is no ground truth available for these patients. To tackle this, I have two approaches. First, I'm using a Keras implementation of your deep network in combination with learning mechanisms that are robust to label noise. After training, gaining insight into what the network learned is the second aspect I'd like to investigate. So the input-feature unit-output correlation maps seem to be a promising method. I followed your tutorial on Amplitude Perturbation Visualization, which worked using the learned model's prediction function.
But regarding the input-feature unit-output correlation maps (without the perturbation), I don't know how I can calculate them. I read through the classes in the visualization package, but I'm not sure where to start. My goal would be an evaluation similar to figures 6/7 in your paper.

Can you point me in the right direction?

Kind regards, Constantin

Tests refactor/cleanup

test_cropped_decoding.py, test_trialwise_decoding.py refactor to use new dataset classes

Rename input time length and n input predictions

Idea was:
input_time_length -> n_in_times
n_preds_per_input -> n_out_times

For more consistency and easier understanding. Can be in many places inside braindecode.

Existing Metrics Check

Easy to understand interface

for existing metrics, check necessary inputs and outputs to define common interface

Create base dataset class with common dataset methods

The MOABBDataset object has attributes and methods that should probably be moved to a base dataset class.

BCIC IV 2a examples without local path

Fetch the BCIC IV 2a data for the realistic examples (examples/bcic_iv_2a_*) without relying on downloaded files in a specific folder.

Checkpointing Save Load

Run Large Experiments, save resources

Note down which parts have to be saved
Check for any local functions etc.

Remove mne_ext

mne_apply and common_average_reference_cnt should be removed and replaced with mne functions through the transforms API (https://github.com/braindecode/braindecode/blob/master/braindecode/datautil/transforms.py#L18)
concatenate_raws_with_events could potentially be replaced by mne.concatenate_raws. However, this modifies the first raw in-place which is probably not intended. Furthermore, there might be a bug, where events of raws are not properly combined. To be tested. If there actually is a bug, report to mne.

Put on pypi

Put new braindecode on pypi

EEGClassifier docstrings

Make EEGClassifier docstring display all the parameteres (including base class). Probably we should use a way similar to skorch which uses __doc__ attribute of base class and modifies it (see here https://github.com/skorch-dev/skorch/blob/560e72149914b2d99ce6477226dd91835db8c37f/skorch/classifier.py#L61).

Improving the transforms

The most recent implementation of transforms asks for an OrderedDict of callables or mne.Raw/Epochs methods that are called successively on the internal Raw/Epochs objects in a ConcatDataset (https://github.com/braindecode/braindecode/blob/master/braindecode/datautil/transforms.py#L18). I wanted to discuss possible improvements to this design:

The current design requires the callables or methods to act in-place, i.e., they must directly modify the Raw/Epochs object. However, apart from .resample and .filter, most mne methods do not seem to act in-place. As for callables, they can be made to modify the objects in-place, but this would mean all the transforms would have to be mne object-aware. I don't have a specific solution in mind, but maybe we could discuss this here.
There is currently no way to apply a transform on-the-fly, i.e., every time a window is about to be returned. This is central to preprocessing steps in a lazy loading scenario however. The design we had started to implement during the sprint used Transform objects that were saved in the dataset object, and called inside __getitem__. This is how torch-vision does it for instance.

What are your thoughts @gemeinl @sbbrandt ?

Improve braindecode.signalproc

Functions exponential_running_standardize and exponential_running_demean should be adapted to accept data input as n_channels x n_times.
highpass_cnt, low_pass_cnt, bandpass_cnt, and filter_is_stable should be removed and usages be replaced with mne functions and follwoing transform API (https://github.com/braindecode/braindecode/blob/master/braindecode/datautil/transforms.py#L18).

tmin in EventWindower and FixedLengthWindower collide with window_size_samples

Currently, defining tmin will change the actual size of the windows. This should not be the case.

Overview dataset / tutorial

Description:

what is windower wrapping?
what is the purpose of this class? what does it produce?
what is the input to windower class, moabb dataset or anything?

Code Style and Naming

When we have the first integrated version skorch-api+dataset, we can make a pass over the code to enforce some more consistency and see what style we prefer (beyond just pep8).

Let's collect open questions for now:

to indicate a variable that specifies an index/counter, e.g., the index of a trial, should we prefix with i(=i_trial) or with idx=idx_trial

moving to the annotations object

As mentioned in the last PR, we thought about moving to the annotations objects to get the events.
So far, the events are read from the stimulus channel for the BCIC IV 2a dataset. Other datasets from MOABB might not have a stimulus channel, but come with an annotation object right away.

I have implemented a method to set the annotations object for the MOABB datasets, but Lukas and I came up with a few questions.

The event description in an annotations object is the description string, e.g. 'left_feet', instead of the integer values. For the event windower, we would need to call events_from_annotations(raw) to get the indices of the events. Here, a new mapping is needed/generated as the events now become integers again. The default option here would result in events starting from 1 on. For classification we would need events from 0 on. Anyway it would be handy to have the resulting mapping saved in the dataset.

The main questions here are:

would we define a custom default mapping to have events started from 0 on in the windower or move that part to the training step?
should we have an events_desc attribute containing the mapping for the dataset?
also I thought about discarding the stimulus channel when setting the annotations, as it is not needed anymore afterwards.

Beside that, the onset of the annotations would start at the actual experiment onset, e.g. shifted by 2s for bcic, and the duration is contained as well. These information can be accessed via MOABB.

What do you think @hubertjb @gemeinl

Collect Use Cases in Pseudocode

Ensure we meet people's needs/cover use cases

define use cases we want to cover in pseudocode

Make EEGClassifier have very nice interface

Put very good defaults so calling without anything will usually give you a good result.

Make braindecode.org use apex domain

Follow @robintibor https://help.github.com/en/github/working-with-github-pages/managing-a-custom-domain-for-your-github-pages-site#configuring-an-apex-domain

Deprecate old braindecode

Put only link to new braindecode in readme, move old readme, remove parts about pip install. make sure github pages as well as github readme is updated.

Tests for dataset classes / functions

Need unit tests and possibly acceptance tests for the dataset classes and functions.

Less Lines for Assignment of Constructor arguments

See old discussion under

Originally posted by @robintibor in https://github.com/braindecode/braindecode/pull/83/files

Implement test cases for exponential_running_standardize() and exponential_running_demean()

"Trialwise Decoding" notebook does not give "stable" results

(I'm new to machine-learning, so forgive me if this bug-report is misguided.)

I downloaded the "plot_bcic_iv_2a_moabb_trial.ipynb" example from here.

I noticed this code:

seed = 20200220  # random seed to make results reproducible
# Set random seed to be able to reproduce results
set_random_seeds(seed=seed, cuda=cuda)

Shouldn't this code make it so that every time I run the notebook, the results are the same in the learning-progress logging, and in the displayed plot?

However, it gives different results each time it runs, despite no code modifications: Screen Capture

Questions:

Is it supposed to get different results each time?
If it is supposed to, is there any way I can modify the code to instead give "stable" results?

Dataloader Use Cases

Allow many people to use it

write down all different use cases (classification(single label/multi label)/regression/continuous labels etc.)
describe how implementation could work in pytorch dataloader logic

Naming of models

Rename Deep4Net -> Deep4Model, ShallowFBCSPNet -> ShallowFBCSPModel

And/But: EEGNet ->EEGNetModel

Remove assumption on 3d/4d from code

Easier to understand interface, less code

if model assumes 4d input, put into model ensure_4d or something like that
remove code everywhere posible that adds empty dimensions etc.

Integrating BrainDecode in MOABB benchmarking pipelines

Using the scikit-learn API, BrainDecode could be used as a regular scikit-learn pipeline and benchmarked against others classical BCI pipelines.

Training metrics/scores should be recomputed at end of epoch

Right now, skorch seems to take the predictions obtained during the training loop iteration to compute metrics for the training set (EpochScoring with on_train=True).
This is not what we want.

CircleCI redownloads

Is there a reason for the force_update=True introduced in 01aa0ce ?

These lines

braindecode/.circleci/config.yml

Lines 93 to 97 in 01aa0ce

  if [[ $(cat $FNAME | grep -x ".*datasets.*eegbci.*" | wc -l) -gt 0 ]]; then 

  python -c "import mne; print([mne.datasets.eegbci.load_data(s, [4, 5, 6, 8, 9, 10, 12, 13, 14], update_path=True, force_update=True) for s in range(1, 51)])"; 

  fi; 

  if [[ $(cat $FNAME | grep -x ".*datasets.*sleep_physionet.*" | wc -l) -gt 0 ]]; then 

  python -c "import mne; print(mne.datasets.sleep_physionet.age.fetch_data([0, 1], recording=[1], update_path=True, force_update=True))";

seem to lead to quite long downloads in case there is a example different from master branch as far as I understand.

See e.g. https://app.circleci.com/pipelines/github/braindecode/braindecode/340/workflows/75971e52-7fec-47ea-a4b9-ffdc114e2410/jobs/398/steps

Wouldn't update_path=False, force_update=False work? This is what we do inside code usually (force_update=False per default from mne), e.g.

braindecode/examples/plot_skorch_crop_decoding.py

Lines 43 to 45 in 01aa0ce

 # and then return the paths to the files. 

 physionet_paths = mne.datasets.eegbci.load_data( 

 subject_id, event_codes, update_path=False

Or will this break CircleCI completely again? Annoying to wait for long times in new pull requests always redownloading same stuff, even multiple times :/ @agramfort

	if [[ $(cat $FNAME \| grep -x ".datasets.eegbci.*" \| wc -l) -gt 0 ]]; then
	python -c "import mne; print([mne.datasets.eegbci.load_data(s, [4, 5, 6, 8, 9, 10, 12, 13, 14], update_path=True, force_update=True) for s in range(1, 51)])";
	fi;
	if [[ $(cat $FNAME \| grep -x ".datasets.sleep_physionet.*" \| wc -l) -gt 0 ]]; then
	python -c "import mne; print(mne.datasets.sleep_physionet.age.fetch_data([0, 1], recording=[1], update_path=True, force_update=True))";

	# and then return the paths to the files.
	physionet_paths = mne.datasets.eegbci.load_data(
	subject_id, event_codes, update_path=False

braindecode / braindecode Goto Github PK

braindecode's Issues

Next steps

Recommend Projects

Recommend Topics

Recommend Org