gillesvandewiele / ehg-oversampling Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 6.0 75.48 MB

Reproducing feature engineering & oversampling experiments on TPEHG DB and assessing the real impact of over-sampling

Python 99.96% Shell 0.04%

ehg-oversampling's People

Contributors

Stargazers

Watchers

Forkers

yyht xinmengwuhen4216 koutianqi cdchushig harel-coffee ffund

ehg-oversampling's Issues

A comparison of various linear and non-linear signal processing techniques to separate uterine EMG records of term and pre-term delivery groups

Has a lot of similar features as #1 #2 #3

Fergus, P., Idowu, I., Hussain, A., Dobbins, C.: Advanced artificial neural network classification for detecting preterm births using ehg records.

Has an entire list of features (some are similar to other related work):

Subramaniam, K., Iqbal, N.V., et al.: Classification of fractal features of uterine emg signal for the prediction of preterm birth.

Proposes the following features:

Higuchi Fractal Dimension
Detrended Fluctuation Analysis

Hoseinzadeh et al. "Use of Electro Hysterogram (EHG) Signal to Diagnose Preterm Birth"

Reported result

Support vector machine (SVM) was implemented to classify the features, and it is worth noting that by using 10 most superior features, the accuracy rate, sensitivity, and specificity were obtained as 97.1%, 95%, and 99%, respectively.

Features

Like Acharya, but with Yule-Walker AR

Model

In this work, SVM classifier with RBF kernel function is used for classification of term and preterm delivery using EHG signal records.

Oversampling

Using the ADASYN method, we resample the obtained features and increase the assume data by 514 data

[NEW] Khan et al. "Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures"

Reported result

The system achieves 95.5% accuracy on publicly available Term-Preterm EHG Database.

Features

In this research four type of features are extracted from the EHG signatures such as; Median frequency [33], Shannon energy [34], Log energy [35], Lyapunov exponent [36] for the categorization of the EHG waveforms.

Model

This research uses support vector machine (SVM)

Oversampling

In this research, adaptive synthetic sampling approach (ADASYN) [31, 37, 38]is used

Comments

Fergus et al. "Prediction of preterm deliveries from EHG signals using machine learning"

Reported result

Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier.

Features

peak frequency, median frequency, root mean squares and sample entropy
Clinical Features (unspecified). --> Continuous: ['Weight', 'Rectime', 'Age', 'Parity', 'Abortions'] || Categorical: ['Hypertension', 'Diabetes', 'Placental_position', 'Bleeding_first_trimester', 'Bleeding_second_trimester', 'Funneling', 'Smoker']

Model

logistic classifier

Oversampling

SMOTE

Hussian et al. "Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women"

Reported result

the proposed approach shows an improvement on existing studies with 89%sensitivity, 91%specificity, 90%positive predicted value, 90%negative predicted value, and an overall accuracy of 90%

Features

root mean squares, peak frequency, median frequency, and sample entropy.

Model

The self-organised network inspired by the immune algorithm is developed to improve recognition and generalization capability of the backpropagation neural networks.

Oversampling

The first evaluation uses the original TPEHG dataset (38 pretermand 262 term)–the preterm are oversampled using min and max to produce 262 pretermrecords).

--> Very unclear...

Comments

Sounds like a modeling technique that never gets used... But, other more well-known techniques are reported in the paper as well, so we can use one of these (Decision Trees, SVM).

Create feature files

Extract all features for each of the signals and for each of the channels. It is important to note that some preprocessing might be required of the signal, which should be checked from literature or tuned as a hyper-parameter. E.g. digital filtering (although this is already done for the TPEHGDB dataset), removal of first and last measurements (I found that cutting off 3000 values from the start and end better reproduced the provided features), normalization, ...

Acharya et al. "Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals"

Reported result

All the ranked features are fed to support vector machine (SVM) classifierfor automated differentiation and achieved an accuracy of 96.25%, sensitivity of 95.08%, and specificity of 97.33% using only tenEHG signal features

Features

The signals are decomposed only up to11 IMFs
In this work, 6 level WPD is implemented on each IMFs of 300 EHG signals using Daubechies 8 (db 8) and obtained a total of 12coefficients.
Feature selection (and ow yes they of course used all data to do this...) is applied to obtain 10 features:

Model

Oversampling

we have employed data balancing using adaptive synthetic sampling approach (ADASYN)

Comments

Ren et al. "Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals"

Reported result

Overall, our results show a clear improvement in prediction accuracy of preterm delivery risk compared with previous approaches, achieving an impressive maximum AUC value of 0.986 when using signals from an electrode positioned below the navel

Features

Entropy is one of the most widely used complexity measures in biomedical signal analysis [41]. In our study Shannon entropy was used to calculate the average uncertainty or unpredictability of the instantaneous amplitude and the instantaneous frequency of the first ten IMF components of the uterine EMG signals obtained by EMD. In this way twenty entropy values can be derived from each EMG recording.
Hence, in our study the entropy ratios of the instantaneous amplitude and the instantaneous frequency of each two IMFs of the uterine EMG signals were calculated for the purpose of exploring the intrinsic relations between IMFs, given by Eqs (7) and (8).
Table 1 shows the classification performances of the extracted features from channel 3 based on both the EMD (180 entropy ratios) and non-EMD methods (P. Ferguset al. used four extracted features together: root mean square, median frequency, peak frequency and sample entropy [30]),
In addition, when we used all the features extracted from three uterine EMG signal channels (180 features per channel, 540 features in total) to classify the preterm and term delivery recordings, this still only achieved an average AUC value of 0.778. However, if we only use the features extracted from channel 3 (180 features) alone, the average AUC value can reach up to 0.89.

Model

AdaBoost

Oversampling

A previous study has applied the synthetic minority over-sampling technique (SMOTE) to classify the records of preterm and term delivery groups in the TPEHG dataset [32]. In our study, the same SMOTE approach was also used.

Comments

Not exactly sure how they reach 180 features per channel...

Scope of the present work

Let's clarify the scope of the present work with the deadline of 01/08.

Ren, P., Yao, S., Li, J., Valdes-Sosa, P.A., Kendrick, K.M.: Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals

Proposes "Empirical Mode Decomposition"

Create notebook/script to analyse the predictive power of features

Create a notebook that analyses each of the features (individually):

hypothesis testing between early <-> late recordings & preterm <-> term records
plot distributions & calculate AUC (same targets as listed above)
create t-SNE or PCA plots to see if there is signal present
...

Jager et al. "Characterization and automatic classification of preterm and term uterine records"

Reported result

The achieved classification accuracy was 100% for early records, recorded around the 23rd week of pregnancy; and 96.33%, the area under the curve of 99.44%, for all records of the database.

Features

For the classification of the entire preterm and term EHG records, the sample entropy, SE, median frequency, MF, and peak amplitude, PA, of the normalized power spectrum were derived, in each of the frequency bands B0, B1, B2, and B3, and for each of the EHG signals, S1, S2, and S3, of the database. Due to the normalization of each power spectrum, the PA from the frequency band B0 was omitted, resulting in 11 features per signal per record.

Model

For these reasons, the QDA classifier seems suitable choice for this study.

Oversampling

The ADASYN technique, used in order to balance the representation of data distribution in two separate classes, increased the number of samples in preterm minority class for early records form 19 to 140, and for all records from 38 to 256.

Comments

This study focuses on two datasets (TPEHGDB and TPEHG DS), and has a few features which are not yet included (using frequency bands).

Idowu et al. "Artificial Intelligence for detecting preterm uterine activity in gynecology and obstetric care"

Reported result

The results illustrate that the Random Forest performed the best of sensitivity 97%, specificity of 85%, Area under the Receiver Operator curve (AUROC) of 94% and mean square error rate of 14%.

Features

Root Mean Square of EHG Signal, Peak Frequency of EHG Signal, Median Frequency, Sample Entropy

Model

Random Forest

Oversampling

To address this issue, the minority class (preterm) has been oversampled using the Synthetic Minority Over-Sampling Technique (SMOTE).

Comments

Acharya, U.R., Sudarshan, V.K., Rong, S.Q., Tan, Z., Lim, C.M., Koh, J.E., Nayak,

S., Bhandary, S.V.: Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals.

Proposes the following features:

Empirical Mode Decomposition (related to #6)
Wavelet Packet Decomposition

Author list

Hi Gyuri,

I thought it might be interesting to have an early discussion about the author list as well. As you may have currently noticed, the author list is already quite extensive due to the fact that this study has been part of a larger project.

Would you agree with being third author in that list? Or would you prefer another position?

The final list would then be:

Gilles Vandewiele, Isabelle Dehaene, György Kovács, Lucas Sterckx, Olivier Janssens, Femke Ongenae, Femke De Backere, Filip De Turck, Kristien Roelens, Sofie Van Hoecke, and Thomas Demeester.

Sadi-Ahmed, N., Kacha, B., Taleb, H., Kedir-Talha, M.: Relevant features selection for automatic prediction of preterm deliveries from pregnancy electrohysterograhic (ehg) records

Proposes the following features:

instantaneous frequency of intrinsic mode functions followed by a Hilbert transform, extracted using the empirical mode decomposition
amplitude of intrinsic mode functions followed by a Hilbert transform, extracted using the empirical mode decomposition

Ahmed et al. "A multivariate multiscale fuzzy entropy algorithm with application to uterine EMG complexity analysis"

Reported result

Based on MMFE features, an improvement in the classification accuracy of term-preterm deliveries was achieved, with a maximum area under the curve (AUC) value of 0.99.

Features

Then both MMFE (Fuzzy Entropy) and MMSE (Sample Entropy) analyses were performed on each one-min epoch (which had 60 × 20 = 1200 samples) and afterwards averaged over the 27 epochs to produce the MMFE or MMSE curves for each record. In this multiscale study, we considered 10 scales for each epoch, so that the coarse graining process of MMFE/MMSE analysis
yielded only 120 samples at the highest scale, which however was sufficient for MFSampEn calculation. These MSampEn or MFSampEn values calculated on 10 different coarse-graining scales were used as features in classification stage

Model

Guassian (??) SVM? Is this RBF?

Oversampling

In this study, to solve the class skew problem, the Adaptive Synthetic Sampling (ADASYN) [44,45] technique was used.

Comments

We have no Fuzzy Entropy, but I think it is enough to only consider Sample Entropy

Hoseinzadeh, S., Amirani, M.C.: Use of electro hysterogram (ehg) signal to diagnose preterm birth.

Uses the following features:

Empirical Mode Decomposition (EMD)
Wavelet Packet Decomposition (WPD)
Autoregressive (AR) Model

Ahmed, M.U., Chanwimalueang, T., Thayyil, S., Mandic, D.P.: A multivariate multiscale fuzzy entropy algorithm with application to uterine emg complexity analysis

Proposes a variant on the Sample Entropy (Multivariate Fuzzy Sample Entropy)

FeaturesSadiAhmed: KeyError

There seems to be a bug in the FeaturesSadiAhmed:

When calling FeaturesAllEHG().extract(signal_ch3[3000:-3000]) for tpehg929 the following exception occurs:

Traceback (most recent call last):
  File "all_features.py", line 38, in <module>
    results_ch3 = fe.extract(signal_ch3[3000:-3000])
  File "/usr/local/lib/python3.6/dist-packages/ehgfeatures-0.0.1-py3.6.egg/ehgfeatures/features/_FeatureGroup.py", line 17, in extract
    results= {**results, **(f.extract(signal))}
  File "/usr/local/lib/python3.6/dist-packages/ehgfeatures-0.0.1-py3.6.egg/ehgfeatures/features/_FeaturesSadiAhmed.py", line 49, in extract
    emd= emds['emd_' + str(i)]
KeyError: 'emd_6'

Fergus et al. "Advanced artificial neural network classification for detecting preterm births using EHG records"

Reported result

The results illustrate that the combination of the Levenberg-Marquardt trained Feed-Forward Neural Network, Radial Basis Function Neural Network and the Random Neural Network classifiers performed the best, with 91% for sensitivity, 84% for specificity, 94% for the area under the curve and 12% for the mean error rate

Features

Model

Some weird flavors of neural nets...

Oversampling

To address this issue, the minority class (preterm) is oversampled using the Synthetic Minority Over-Sampling Technique (SMOTE).

Comments

I would not focus too much on the specifics of their neural net and just use a simple feed-forward network.

Fergus, P., Cheung, P., Hussain, A., Al-Jumeily, D., Dobbins, C., Iram, S.: Prediction of preterm deliveries from ehg signals using machine learning.

Root Mean Squares
Peak Frequency
Median Frequency
Sample Entropy

--> Should be provided with the original data (so no work here)

Franc Jager, Sonja Libenšek, Ksenija Geršak: Characterization and automatic classification of preterm and term uterine records

--> Uses four different frequency bands to extract the following features from:

Sample Entropy
Median Frequency
Peak Amplitude

Also applies some extra pre-processing of the signal (Fourier, Hanning windows, ...)

[NEW] Peng et al. "Evaluation of electrohysterogram measured from different gestational weeks for recognizing preterm delivery: a preliminary study using random Forest"

Reported result

After employing the adaptive synthetic sampling approach and six-fold cross-validation, the accuracy (ACC), sensitivity, specificity and area under the curve (AUC) were applied to evaluate RF classification. For PL and TL group, RF achieved the ACC of 0.93, sensitivity of 0.89, specificity of 0.97, and AUC of 0.80. Similarly, their corresponding values were 0.92, 0.88, 0.96 and 0.88 for PE and TE group, indicating that RF could be used to recognize preterm delivery effectively with EHG signals recorded before the 26th week of gestation.

Features

31 features per EHG recording:

RMS
Autocorrelation zero-crossing
Peak frequency
Median frequency
Mean frequency
Features from the wavelet decomposition mainly included the maximum, energy, singular and variance values
Features extracted from autoregressive (AR) model
Time reversibility
Lyapunov exponent
Sample entropy
Correlation dimension

Model

Random Forest

Oversampling

Adasyn

Comments

Seems like we have all features in place except for time reversibility... I implemented that one in a previous project:

def time_reversibility(data):
    norm = 1 / (len(data) - 1)
    lagged_data = data[1:]
    return norm * np.sum(np.power((lagged_data - data[:-1]), 3))

Smrdel, A., Jager, F.: Separating sets of term and pre-term uterine emg records.

Features very similar to #1 and #2, but uses a autoregressive method to estimate the power spectrum.

Consider general timeseries feature extraction packages (HCTSA & TSFRESH)

Additionally, we could also use TSFRESH and HCTSA to extract another few 1000 extra features. These features are very generic ones for timeseries (and thus maybe not suited for high-frequency biomedical signals), but could contain a few interesting ones. Of course, this would cause our feature elimination to be much more expensive...

Janjarasjitt, S.: Evaluation of performance on preterm birth classification using single wavelet-based features of ehg signals

Proposes wavelet-based features (four-step algorithm)

Public datasets

We can perform an analysis of the predictive power of the extracted features using different datasets:

TPEHGDB: https://physionet.org/content/tpehgdb/1.0.1/
TPEHG + Toco: https://physionet.org/content/tpehgt/1.0.0/
Icelandic dataset: https://physionet.org/content/ehgdb/1.0.0/

gillesvandewiele / ehg-oversampling Goto Github PK

ehg-oversampling's People

Contributors

Stargazers

Watchers

Forkers

ehg-oversampling's Issues

Reported result

Features

Model

Oversampling

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Reported result

Features

Model

Oversampling

Comments

Recommend Projects

Recommend Topics

Recommend Org