Giter Club home page Giter Club logo

biopsykit's Introduction

BioPsyKit

PyPI status GitHub Documentation Status Test and Lint codecov Code style: black PyPI - Downloads GitHub commit activity

A Python package for the analysis of biopsychological data.

With this package you have everything you need for analyzing biopsychological data, including:

  • Data processing pipelines for various physiological signals (ECG, EEG, Respiration, Motion, ...).
  • Algorithms and data processing pipelines for sleep/wake prediction and computation of sleep endpoints based on activity or IMU data.
  • Functions to import and process data from sleep trackers (e.g., Withings Sleep Analyzer)
  • Functions for processing and analysis of salivary biomarker data (cortisol, amylase).
  • Implementation of various psychological and HCI-related questionnaires.
  • Implementation of classes representing different psychological protocols (e.g., TSST, MIST, Cortisol Awakening Response Assessment, etc.)
  • Functions for easily setting up statistical analysis pipelines.
  • Functions for setting up and evaluating machine learning pipelines.
  • Plotting wrappers optimized for displaying biopsychological data.

Details

Analysis of Physiological Signals

ECG Processing

BioPsyKit provides a whole ECG data processing pipeline, consisting of:

  • Loading ECG data from:
    • Generic .csv files
    • NilsPod binary (.bin) files (requires NilsPodLib)
    • Other sensor types (coming soon)
  • Splitting data into single study parts (based on time intervals) that will be analyzed separately
  • Performing ECG processing, including:
    • R peak detection (using Neurokit)
    • R peak outlier removal and interpolation
    • HRV feature computation
    • ECG-derived respiration (EDR) estimation for respiration rate and respiratory sinus arrhythmia (RSA) (experimental)
    • Instantaneous heart rate resampling
    • Computing aggregated results (e.g., mean and standard error) per study part
  • Creating plots for visualizing processing results

Quick Example

from biopsykit.signals.ecg import EcgProcessor
from biopsykit.example_data import get_ecg_example

ecg_data, sampling_rate = get_ecg_example()

ep = EcgProcessor(ecg_data, sampling_rate)
ep.ecg_process()

print(ep.ecg_result)

... more biosignals coming soon!

Sleep/Wake Prediction

BioPsyKit allows to process sleep data collected from IMU or activity sensors (e.g., Actigraphs). This includes:

  • Detection of wear periods
  • Detection of time spent in bed
  • Detection of sleep and wake phases
  • Computation of sleep endpoints (e.g., sleep and wake onset, net sleep duration wake after sleep onset, etc.)

Quick Example

import biopsykit as bp
from biopsykit.example_data import get_sleep_imu_example

imu_data, sampling_rate = get_sleep_imu_example()

sleep_results = bp.sleep.sleep_processing_pipeline.predict_pipeline_acceleration(imu_data, sampling_rate)
sleep_endpoints = sleep_results["sleep_endpoints"]

print(sleep_endpoints)

Salivary Biomarker Analysis

BioPsyKit provides several methods for the analysis of salivary biomarkers (e.g. cortisol and amylase), such as:

  • Import data from Excel and csv files into a standardized format
  • Compute standard features (maximum increase, slope, area-under-the-curve, mean, standard deviation, ...)

Quick Example

import biopsykit as bp
from biopsykit.example_data import get_saliva_example

saliva_data = get_saliva_example(sample_times=[-20, 0, 10, 20, 30, 40, 50])

max_inc = bp.saliva.max_increase(saliva_data)
# remove the first saliva sample (t=-20) from computing the AUC
auc = bp.saliva.auc(saliva_data, remove_s0=True)

print(max_inc)
print(auc)

Questionnaires

BioPsyKit implements various established psychological (state and trait) questionnaires, such as:

  • Perceived Stress Scale (PSS)
  • Positive and Negative Affect Schedule (PANAS)
  • Self-Compassion Scale (SCS)
  • Big Five Inventory (BFI)
  • State Trait Depression and Anxiety Questionnaire (STADI)
  • Trier Inventory for Chronic Stress (TICS)
  • Primary Appraisal Secondary Appraisal Scale (PASA)
  • ...

Quick Example

import biopsykit as bp
from biopsykit.example_data import get_questionnaire_example

data = get_questionnaire_example()

pss_data = data.filter(like="PSS")
pss_result = bp.questionnaires.pss(pss_data)

print(pss_result)

List Supported Questionnaires

import biopsykit as bp

print(bp.questionnaires.utils.get_supported_questionnaires())

Psychological Protocols

BioPsyKit implements methods for easy handling and analysis of data recorded with several established psychological protocols, such as:

  • Montreal Imaging Stress Task (MIST)
  • Trier Social Stress Test (TSST)
  • Cortisol Awakening Response Assessment (CAR)
  • ...

Quick Example

from biopsykit.protocols import TSST
from biopsykit.example_data import get_saliva_example
from biopsykit.example_data import get_hr_subject_data_dict_example
# specify TSST structure and the durations of the single phases
structure = {
   "Pre": None,
   "TSST": {
       "Preparation": 300,
       "Talk": 300,
       "Math": 300
   },
   "Post": None
}
tsst = TSST(name="TSST", structure=structure)

saliva_data = get_saliva_example(sample_times=[-20, 0, 10, 20, 30, 40, 50])
hr_subject_data_dict = get_hr_subject_data_dict_example()
# add saliva data collected during the whole TSST procedure
tsst.add_saliva_data(saliva_data, saliva_type="cortisol")
# add heart rate data collected during the "TSST" study part
tsst.add_hr_data(hr_subject_data_dict, study_part="TSST")
# compute heart rate results: normalize ECG data relative to "Preparation" phase; afterwards, use data from the 
# "Talk" and "Math" phases and compute the average heart rate for each subject and study phase, respectively
tsst.compute_hr_results(
    result_id="hr_mean",
    study_part="TSST",
    normalize_to=True,
    select_phases=True,
    mean_per_subject=True,
    params={
        "normalize_to": "Preparation",
        "select_phases": ["Talk", "Math"]
    }
)

Statistical Analysis

BioPsyKit implements methods for simplified statistical analysis of biopsychological data by offering an object-oriented interface for setting up statistical analysis pipelines, displaying the results, and adding statistical significance brackets to plots.

Quick Example

import matplotlib.pyplot as plt
from biopsykit.stats import StatsPipeline
from biopsykit.plotting import multi_feature_boxplot
from biopsykit.example_data import get_stats_example

data = get_stats_example()

# configure statistical analysis pipeline which consists of checking for normal distribution and performing paired 
# t-tests (within-variable: time) on each questionnaire subscale separately (grouping data by subscale).
pipeline = StatsPipeline(
    steps=[("prep", "normality"), ("test", "pairwise_ttests")],
    params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"}
)

# apply statistics pipeline on data
pipeline.apply(data)

# plot data and add statistical significance brackets from statistical analysis pipeline
fig, axs = plt.subplots(ncols=3)
features = ["NegativeAffect", "PositiveAffect", "Total"]
# generate statistical significance brackets
box_pairs, pvalues = pipeline.sig_brackets(
    "test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)
# plot data
multi_feature_boxplot(
    data=data, x="time", y="PANAS", features=features, group="subscale", order=["pre", "post"],
    stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues}, ax=axs
)

Machine Learning Analysis

BioPsyKit implements methods for simplified and systematic evaluation of different machine learning pipelines.

Quick Example

# Utils
from sklearn.datasets import load_breast_cancer
# Preprocessing & Feature Selection
from sklearn.feature_selection import SelectKBest
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
# Cross-Validation
from sklearn.model_selection import KFold

from biopsykit.classification.model_selection import SklearnPipelinePermuter

# load example dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# specify estimator combinations
model_dict = {
  "scaler": {
    "StandardScaler": StandardScaler(),
    "MinMaxScaler": MinMaxScaler()
  },
  "reduce_dim": {
    "SelectKBest": SelectKBest(),
  },
  "clf": {
    "KNeighborsClassifier": KNeighborsClassifier(),
    "DecisionTreeClassifier": DecisionTreeClassifier(),
  }
}
# specify hyperparameter for grid search
params_dict = {
  "StandardScaler": None,
  "MinMaxScaler": None,
  "SelectKBest": {"k": [2, 4, "all"]},
  "KNeighborsClassifier": {"n_neighbors": [2, 4], "weights": ["uniform", "distance"]},
  "DecisionTreeClassifier": {"criterion": ['gini', 'entropy'], "max_depth": [2, 4]},
}

pipeline_permuter = SklearnPipelinePermuter(model_dict, params_dict)
pipeline_permuter.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))

# print summary of all relevant metrics for the best pipeline for each evaluated pipeline combination
print(pipeline_permuter.metric_summary())

Installation

BioPsyKit requires Python >=3.8. First, install a compatible version of Python. Then install BioPsyKit via pip.

Installation from PyPi:

pip install biopsykit

Installation from PyPi with extras (e.g., jupyter to directly install all required dependencies for the use with Jupyter Lab):

pip install "biopsykit[jupyter]"

Installation from local repository copy:

git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
pip install .

For Developer

If you are a developer and want to contribute to BioPsyKit you can install an editable version of the package from a local copy of the repository.

BioPsyKit uses poetry to manage dependencies and packaging. Once you installed poetry, run the following commands to clone the repository, initialize a virtual env and install all development dependencies:

Without Extras

git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
poetry install

With all Extras (e.g., extended functionalities for IPython/Jupyter Notebooks)

git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
poetry install -E mne -E jupyter 

To run any of the tools required for the development workflow, use the poe commands of the poethepoet task runner:

$ poe
docs                 Build the html docs using Sphinx.
format               Reformat all files using black.
format_check         Check, but not change, formatting using black.
lint                 Lint all files with Prospector.
test                 Run Pytest with coverage.
update_version       Bump the version in pyproject.toml and biopsykit.__init__ .
register_ipykernel   Register a new IPython kernel named `biopsykit` linked to the virtual environment.
remove_ipykernel     Remove the associated IPython kernel.

Some Notes

  • The poe commands are only available if you are in the virtual environment associated with this project. You can either activate the virtual environment manually (e.g., source .venv/bin/activate) or use the poetry shell command to spawn a new shell with the virtual environment activated.

  • In order to use jupyter notebooks with the project you need to register a new IPython kernel associated with the venv of the project (poe register_ipykernel - see below). When creating a notebook, make to sure to select this kernel (top right corner of the notebook).

  • In order to build the documentation, you need to additionally install pandoc.


See the Contributing Guidelines for further information.

Examples

See the Examples Gallery for example on how to use BioPsyKit.

Citing BioPsyKit

If you use BioPsyKit in your work, please report the version you used in the text. Additionally, please also cite the corresponding paper:

Richer et al., (2021). BioPsyKit: A Python package for the analysis of biopsychological data. Journal of Open Source Software, 6(66), 3702, https://doi.org/10.21105/joss.03702

If you use a specific algorithm please also to make sure you cite the original paper of the algorithm! We recommend the following citation style:

We used the algorithm proposed by Author et al. [paper-citation], implemented by the BioPsykit package [biopsykit-citation].

biopsykit's People

Contributors

aksei avatar akuederle avatar danielkrauss2 avatar janiszen avatar juliajorkowitz avatar livhe avatar lucaabel avatar rebecca243 avatar richrobe avatar rouzbeh avatar ullrimar avatar victoria1509 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

biopsykit's Issues

[JOSS review] Get example datasets in README error

Hi there, after installing biopsykit with ease, I've encountered some errors when running the examples listed on the readme, particularly those that are related to retrieving example datasets:

import biopsykit as bp
from biopsykit.example_data import get_saliva_example

saliva_data = get_saliva_example(sample_times=[-20, 0, 10, 20, 30, 40, 50])

I get an error like so:


  File "c:\users\zen juen\downloads\wpy64-3850\python-3.8.5.amd64\lib\site-packages\pandas\io\parsers.py", line 1862, in __init__
    self._open_handles(src, kwds)

  File "c:\users\zen juen\downloads\wpy64-3850\python-3.8.5.amd64\lib\site-packages\pandas\io\parsers.py", line 1357, in _open_handles
    self.handles = get_handle(

  File "c:\users\zen juen\downloads\wpy64-3850\python-3.8.5.amd64\lib\site-packages\pandas\io\common.py", line 642, in get_handle
    handle = open(

FileNotFoundError: [Errno 2] No such file or directory: 'c:\\users\\zen juen\\downloads\\wpy64-3850\\python-3.8.5.amd64\\lib\\example_data\\cortisol_sample.csv'

This also happens for get_sleep_imu_example(), get_questionnaire_example() and get_mist_hr_example(). Might this be to do with the way functions are setting the data path?

(Note: as part of openjournals/joss-reviews#3702)

WearDetection - algorithm reference

Thanks very much for putting together this lovely package :-) I was wondering if you could point me towards the reference/publication for the no-wear detection algorithm implemented in the class WearDetection here. Furthermore, why did you choose that particular algorithm, if I may ask? Thanks!

[JOSS Review] Paper

Thanks for the submission of BioPsyKit to JOSS! This Python package appears to be a useful software contribution to the biopsychology research field that aims to combine main tools and methods into one. At large, the package wraps around existing analysis and visualization tools like scipy, pandas, scikit-learn, seaborn, matplotlib etc. and implements fairly standard protocols encountered in the field, ranging from handling questionnaire data, biomarker data and time-continuous electrophysiology data like ECG and EEG. The software implementation appears sound and appears well tested in the form of many unit tests, but I did have various problems running the different example notebooks (#14 ) and did find the installation cumbersome due to difficulties in resolving dependencies (#13 ) which may be a dealbreaker for some.

Otherwise, I will keep it short here as the other reviewer already raised a few valid points (#12 ). In particular, I agree that a comparison with existing/alternative tools (also outside the Python realm) with overlapping functionality would be most useful. Does BioPsyKit for instance incorporate new features not available in its alternatives, beyond its somewhat uniform user high(er) level interface and data structures?

questionnaire functions return 0.0 when row entries are NaN --> should return NaN

Description
Returning 0.0 instead of NaN in certain questionnaire functions in Biopsykit when the entries (rows) contain only NaN values, particularly when calculating the sum. For instance, NaN is returned as the value when calculating the mean for a subscale/score.

To Reproduce
Steps to reproduce the behavior:

  1. Use Biopsykit functions that involve sum calculations for subscales/scores.
  2. Provide a dataset with rows containing only NaN values.3.
  3. Calculate the score/subscales
  4. Check NaN rows in returned df

Expected behavior
When calculating the sum for a subscale/score, the function should return NaN if the participant's row contains only NaN values.

System Specifications:

  • Operating System: macOS
  • BioPsyKit version: 0.9.0
  • Python version: 3.9

[JOSS Review] Installation

This project's README suggests Poetry as the main means of installation of this package and dependencies to very specific versions. This is an issue on my machine, an M1 macbook pro, as not every dependency is readily available from PyPI, mainly scipy v1.7.1. As of now, only MacOS x86_64 wheels are provided. Arm64 builds are also not yet available from unofficial sources like conda-forge. Building such dependencies from source may be an issue for many users. Tried installing BioPsyKit now in a conda environment (using the miniforge python distribution) and in a virtuelenv using packages installed via macports. The only way I could get the different example notebooks to run was to add src to PYTHONPATH and install the closest matching dependencies manually in the python environment I set up.

There is nothing in this package that should prevent use of somewhat older (and future) versions of BioPsyKit's dependencies like scipy.

As the pyproject.toml file is provided, I think the README file could mention pip install . as an installation option without the need to install Poetry.

Another minor issue with the README: cd biopsykit should be cd BioPsyKit in the installation instructions.

Linked issue: openjournals/joss-reviews#3702

Regression

Regression Architectures are not working so far

ERG signal processing Module

Is your feature request related to a problem? Please describe.
A module for processing electroretinograms signals similar to how the ECG/EEG signal processing module works, but specifically for ERG signals(Full-Field, Pattern, Multi-focal).

Describe alternatives you've considered
Currently there are no alternatives that do exactly that, only scientific(such as Scipy) that have signal processing modules built in.

question: sleep.plotting module

I have additional questions on this site.
https://biopsykit.readthedocs.io/en/latest/examples/_notebooks/Sleep_IMU_Example.html

  1. This 'sleep_processing_pipeline.predict_pipeline_acceleration()' function shows the legacy algorithm based on Cole/Kripke as the default. Right?

And look at the tutorial right after.

  1. In 'Cut Data to Wear Period' part, the wear_detection.WearDetection() function is used, and the internal logic shows that it is based on van hees heuristic. So the plot underneath is a picture obtained by van hees?

  2. In some papers, applied a band pass filter to 50hz acceleration sensor data and processed the signal. The filtering code is not visible in the internal logic, but I wonder if it doesn't matter because it's heuristic.

  3. This 'get_major_wear_block(data)' function is analyzed by wearing the imu for the longest time, and the sample data is only at night. If it is worn long during the day, how can I judge it?

image

Thank you!

Question

Hi @richrobe, hope you're doing great ☺️
I just stumbled across this nice package, and there are some very useful and interesting stuff here, looking forward to see its future development!

And looking at the ECG code, there are actually several interesting features that we don't have in neurokit afaik, such as some outliers detection methods or one of the EDR method computation, etc (pinging @zen-juen to double-check). I was wondering if you would be okay if we added them in NK too as it could also benefit our users? It could potentially decrease the complexity / length of your code if you wanted then to call these functions from there (I'm saying that coz I saw you already have NK as a dependency) Anyway, let me know what you think! Take care

[JOSS REVIEW] Documentation

Hi there, when ticking off the documentation checks on openjournals/joss-reviews#3702 I realize that there aren’t guidelines yet for potential contributors, collaborators, and those seeking support for software use. You may want to refer to https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors for doing so, as well as add some templates for writing issues and creating pull requests!

Overall, I think the README is clear with short and succinct examples of use and are easily referred to when going through each section on different types of analysis in the paper (e.g., sleep analysis, biomarker analysis…). However, there is no documentation of how statistical analyses and machine learning pipelines (using stats and classification) can be readily implemented. As it's not intuitive yet on how this can be done, I think a short example script could be nice - perhaps the one used to generate the Figure 5 boxplot in the paper?

Let me know what you think!

van Hees 2015 sleep / wake classification algorithm

Is your feature request related to a problem? Please describe.
Current sleep/wake classification algorithms all rely on activity counts, which are difficult to replicate properly for a variety of reasons.

Describe the solution you'd like
A sleep/wake classification algorithm based off the van Hees, 2015 paper

This method was designed purposely to be an easy to describe heuristic. It also showed similar performance to other sleep/wake classification algorithms. It uses two thresholds, 1 - the z-angle range and 2- a length of time. The default parameters are 5 degrees and 5 minutes. So, if the z-angle changes less than 5-degrees in 5-minutes, then the 5-minute period is classified as sleep. If the z-angel changes more than 5-degrees in 5-minutes, then the 5-minute period is classified as wake.

[JOSS Review] Broken example notebooks

There are several errors and issues with the provided example notebooks:

  • There are many abbreviations like IMU, PSS, PANAS, HRV that are not explained. Perhaps these are standard terms in the field, but for a broader acceptance of this tool, these abbreviations should be defined.
  • ECG_Processing_Example.ipynb: throws an error at the line fig, axs = ecg.plotting.ecg_plot(ep, key='Data', figsize=(10,5)); NameError: name 'ecg' is not defined
  • ECG_Analysis_Example.ipynb fails on line display_dict_structure(study_data_dict) because of the above error in ECG_Processing_Example.ipynb (no results to load).
  • EEG_Example.ipynb: Please add some documentation of what functionality this notebook is supposed to demonstrate. Some time series are shown but it’s not very clear what is going on.
  • Log_Data_Example.ipynb; ditto.
  • Protocol_Example.ipynb fails because of the above error in ECG_Processing_Example.ipynb (no results to load)
  • Questionnaire_Example.ipynb: Incorporating purposely failing code (pss = bp.questionnaires.pss(data_pss) resulting in ValueRangeError … ) in the notebook, I don’t think that’s a good idea whatsoever (prevents running all cells and such). One could capture the error in a more controlled manner.
  • SklearnPipelinePermuter_Example.ipynb: Throws an error NameError: name 'datasets' is not defined on line breast_cancer = datasets.load_breast_cancer()
  • StatsPipeline_Plotting_Example.ipynb: Typo “cortsol”
  • STROOP_Example.ipynb: Crashes at line stroop.hr_ensemble_plot(data=dict_phase,figsize=(10,8),ylims=(40,120)) with TypeError: hr_ensemble_plot() missing 1 required positional argument: 'ensemble_id'

Linked issue: openjournals/joss-reviews#3702

MIST code

hi, tanks for the BioPsyKit, I have a question, this package have the MIST task or is only to precess the data of the MIST, I would like to implement in a experimental setting but I can't run the MIST using this package. do you have the MIST in python code?

[JOSS REVIEW] Paper

As part of openjournals/joss-reviews#3702

Summary

I think the summary nicely conveys in layman terms, the aim of the package, which is to allow for different methods in biopsychology to be made accessible and performed systematically using a single package. However, the third paragraph in the summary which talks about the use cases of these data isn't the most relevant to building an argument for why BioPsyKit is needed. Rather than a generic description of when the data is collected/analysis types, the purpose of this package may be made more convincing by stating what are the gaps in current biopsychology methodologies and their consequences. Although it is briefly mentioned (in the statement of need) that researchers conventionally use different assessment modalities, I think it is the implications of this approach that need to be highlighted here. For example, how does using different assessment modalities impact reproducibility in research? How does combining tools in BioPsyKit facilitate/streamline analyses and why is it important? At the analysis level, are the algorithm pipelines made accessible to researchers, or are they too opaque, and how does BioPsyKit address this? These are some questions that I think would be important for addressing the gap BioPsyKit is specifically fulfilling.

State of the field

Currently, other related packages in the field are not yet acknowledged in the paper. A good start may be to compare with, for example, more signal-specific packages, like pyHRV and antropy, both of which have overlapping functionalities with BioPsyKit as they focus on ECG and EEG signals respectively. Apart from mentioning neurokit2 as a dependency, it may also be helpful to further elaborate on how BioPsyKit’s aim and functionalities are distinct/and or complementary, given that neurokit similarly accommodates for a variety of biosignals.

On a related note, I do think that the scope of BioPsyKit is currently not very well-defined because of the multiple modalities included (even though this can also be perceived as a strength of the package). I understand that from the offset, it is explicitly stated that the purpose was to combine tools in biopsychology. However, I am not sure what the added benefits are of having, for example, questionnaire/protocols implementations packaged together with eletrophysiological data processing (i.e., is this just for convenience?). If the aim is to facilitate simultaneous processing of different data, then I think some intregration of functionalities needs to be available, as the existing modules seem quite independent as of now alongside other miscellaneous utilities like data wrangling and stats implementation - because if not, users of BioPsyKit would have processing/analysis pipelines as lengthy as if they were to use signal-specific packages that are already well-established. 🤔 Just some thoughts!

Quality of writing

Overall, I think the paper flows nicely and descriptions are concise and straight to the point. I just have two minor comments:

  1. Figure 1 nicely depicts the structure of BioPsyKit, but I have some questions regarding sleep_wake and sleep_endpoints. If they differ based on the former detecting when individuals wake up, and the latter detecting when sleep ends, are they functionally equivalent? Providing some elaboration of these submodules’ features in this figure may help here. Additionally, I realized from the repo that data_handling isn’t a submodule on its own like the others, as it seems like its listed functionalities are subsumed under the utils submodule. Perhaps to change data_handling to utils to be consistent with the structuring of submodules in this figure!

  2. The need for psychological protocols in a software package is not very clear to me yet. I understand that this may be a word limit issue, but I think some clarity of their practical functionality can be provided, on top of just stating what protocols are available. Intuitively from the code, it seems like their purpose is to provide a “data structure” for the organization of different modalities of experimental data – if so, I think this is important to state in the paper.

Let me know what you think! :)

Classification - add test-indices to summary

  1. In nested_cv.py nested_cv_param_search() add train and test indices to cols and results_dict:
cols = [
        "param_search",
        "cv_results",
        "best_estimator",
        "conf_matrix",
        "predicted_labels",
        "true_labels",
        "train_indices",      #add this line
        "test_indices"        #add this line
    ]
results_dict["train_indices"].append(train)     #add this line
results_dict["test_indices"].append(test)       #add this line
results_dict["predicted_labels"].append(cv_obj.predict(x_test))
results_dict["true_labels"].append(y_test)
results_dict["cv_results"].append(cv_obj.cv_results_)
results_dict["best_estimator"].append(cv_obj.best_estimator_)
results_dict["conf_matrix"].append(confusion_matrix(y_test, cv_obj.predict(x_test), normalize=None))
  1. in sklearn_pipeline_permuter.py metric_summary() get test indices and add it to df_metric:
for param_key, param_value in self.param_searches.items():
      ...
      test_indices = np.array(param_value["test_indices"], dtype="object").ravel()
      ...
      df_metric["test_indices"] = [test_indices]
      
      for key in param_values:
            if "test" in key:
                  if "test_indices" in key:
                        continue
  1. Optional: Find a more elegant way to exclude test_indices from metric calculation ;)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.