Giter Club home page Giter Club logo

ehg-oversampling's People

Contributors

gillesvandewiele avatar gykovacs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ehg-oversampling's Issues

Hoseinzadeh et al. "Use of Electro Hysterogram (EHG) Signal to Diagnose Preterm Birth"

Reported result

Support vector machine (SVM) was implemented to classify the features, and it is worth noting that by using 10 most superior features, the accuracy rate, sensitivity, and specificity were obtained as 97.1%, 95%, and 99%, respectively.

Features

Like Acharya, but with Yule-Walker AR

Model

In this work, SVM classifier with RBF kernel function is used for classification of term and preterm delivery using EHG signal records.

Oversampling

Using the ADASYN method, we resample the obtained features and increase the assume data by 514 data

[NEW] Khan et al. "Characterization of Term and Preterm Deliveries using Electrohysterograms Signatures"

Reported result

The system achieves 95.5% accuracy on publicly available Term-Preterm EHG Database.

Features

In this research four type of features are extracted from the EHG signatures such as; Median frequency [33], Shannon energy [34], Log energy [35], Lyapunov exponent [36] for the categorization of the EHG waveforms.

Model

This research uses support vector machine (SVM)

Oversampling

In this research, adaptive synthetic sampling approach (ADASYN) [31, 37, 38]is used

Comments

Fergus et al. "Prediction of preterm deliveries from EHG signals using machine learning"

Reported result

Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier.

Features

  • peak frequency, median frequency, root mean squares and sample entropy
  • Clinical Features (unspecified). --> Continuous: ['Weight', 'Rectime', 'Age', 'Parity', 'Abortions'] || Categorical: ['Hypertension', 'Diabetes', 'Placental_position', 'Bleeding_first_trimester', 'Bleeding_second_trimester', 'Funneling', 'Smoker']

Model

logistic classifier

Oversampling

SMOTE

Hussian et al. "Dynamic neural network architecture inspired by the immune algorithm to predict preterm deliveries in pregnant women"

Reported result

the proposed approach shows an improvement on existing studies with 89%sensitivity, 91%specificity, 90%positive predicted value, 90%negative predicted value, and an overall accuracy of 90%

Features

root mean squares, peak frequency, median frequency, and sample entropy.

Model

The self-organised network inspired by the immune algorithm is developed to improve recognition and generalization capability of the backpropagation neural networks.

Oversampling

The first evaluation uses the original TPEHG dataset (38 pretermand 262 term)–the preterm are oversampled using min and max to produce 262 pretermrecords).

--> Very unclear...

Comments

Sounds like a modeling technique that never gets used... But, other more well-known techniques are reported in the paper as well, so we can use one of these (Decision Trees, SVM).

Create feature files

Extract all features for each of the signals and for each of the channels. It is important to note that some preprocessing might be required of the signal, which should be checked from literature or tuned as a hyper-parameter. E.g. digital filtering (although this is already done for the TPEHGDB dataset), removal of first and last measurements (I found that cutting off 3000 values from the start and end better reproduced the provided features), normalization, ...

Acharya et al. "Automated detection of premature delivery using empirical mode and wavelet packet decomposition techniques with uterine electromyogram signals"

Reported result

All the ranked features are fed to support vector machine (SVM) classifierfor automated differentiation and achieved an accuracy of 96.25%, sensitivity of 95.08%, and specificity of 97.33% using only tenEHG signal features

Features

  • The signals are decomposed only up to11 IMFs

  • In this work, 6 level WPD is implemented on each IMFs of 300 EHG signals using Daubechies 8 (db 8) and obtained a total of 12coefficients.

  • Feature selection (and ow yes they of course used all data to do this...) is applied to obtain 10 features:

Screenshot from 2020-01-02 14-40-52

Model

  • SVM

Oversampling

we have employed data balancing using adaptive synthetic sampling approach (ADASYN)

Comments

Ren et al. "Improved prediction of preterm delivery using empirical mode decomposition analysis of uterine electromyography signals"

Reported result

Overall, our results show a clear improvement in prediction accuracy of preterm delivery risk compared with previous approaches, achieving an impressive maximum AUC value of 0.986 when using signals from an electrode positioned below the navel

Features

  • Entropy is one of the most widely used complexity measures in biomedical signal analysis [41]. In our study Shannon entropy was used to calculate the average uncertainty or unpredictability of the instantaneous amplitude and the instantaneous frequency of the first ten IMF components of the uterine EMG signals obtained by EMD. In this way twenty entropy values can be derived from each EMG recording.

  • Hence, in our study the entropy ratios of the instantaneous amplitude and the instantaneous frequency of each two IMFs of the uterine EMG signals were calculated for the purpose of exploring the intrinsic relations between IMFs, given by Eqs (7) and (8).

  • Table 1 shows the classification performances of the extracted features from channel 3 based on both the EMD (180 entropy ratios) and non-EMD methods (P. Ferguset al. used four extracted features together: root mean square, median frequency, peak frequency and sample entropy [30]),

  • In addition, when we used all the features extracted from three uterine EMG signal channels (180 features per channel, 540 features in total) to classify the preterm and term delivery recordings, this still only achieved an average AUC value of 0.778. However, if we only use the features extracted from channel 3 (180 features) alone, the average AUC value can reach up to 0.89.

Model

AdaBoost

Oversampling

A previous study has applied the synthetic minority over-sampling technique (SMOTE) to classify the records of preterm and term delivery groups in the TPEHG dataset [32]. In our study, the same SMOTE approach was also used.

Comments

Not exactly sure how they reach 180 features per channel...

Create notebook/script to analyse the predictive power of features

Create a notebook that analyses each of the features (individually):

  • hypothesis testing between early <-> late recordings & preterm <-> term records
  • plot distributions & calculate AUC (same targets as listed above)
  • create t-SNE or PCA plots to see if there is signal present
  • ...

Jager et al. "Characterization and automatic classification of preterm and term uterine records"

Reported result

The achieved classification accuracy was 100% for early records, recorded around the 23rd week of pregnancy; and 96.33%, the area under the curve of 99.44%, for all records of the database.

Features

For the classification of the entire preterm and term EHG records, the sample entropy, SE, median frequency, MF, and peak amplitude, PA, of the normalized power spectrum were derived, in each of the frequency bands B0, B1, B2, and B3, and for each of the EHG signals, S1, S2, and S3, of the database. Due to the normalization of each power spectrum, the PA from the frequency band B0 was omitted, resulting in 11 features per signal per record.

Model

For these reasons, the QDA classifier seems suitable choice for this study.

Oversampling

The ADASYN technique, used in order to balance the representation of data distribution in two separate classes, increased the number of samples in preterm minority class for early records form 19 to 140, and for all records from 38 to 256.

Comments

This study focuses on two datasets (TPEHGDB and TPEHG DS), and has a few features which are not yet included (using frequency bands).

Idowu et al. "Artificial Intelligence for detecting preterm uterine activity in gynecology and obstetric care"

Reported result

The results illustrate that the Random Forest performed the best of sensitivity 97%, specificity of 85%, Area under the Receiver Operator curve (AUROC) of 94% and mean square error rate of 14%.

Features

Root Mean Square of EHG Signal, Peak Frequency of EHG Signal, Median Frequency, Sample Entropy

Model

Random Forest

Oversampling

To address this issue, the minority class (preterm) has been oversampled using the Synthetic Minority Over-Sampling Technique (SMOTE).

Comments

Author list

Hi Gyuri,

I thought it might be interesting to have an early discussion about the author list as well. As you may have currently noticed, the author list is already quite extensive due to the fact that this study has been part of a larger project.

Would you agree with being third author in that list? Or would you prefer another position?

The final list would then be:

Gilles Vandewiele, Isabelle Dehaene, György Kovács, Lucas Sterckx, Olivier Janssens, Femke Ongenae, Femke De Backere, Filip De Turck, Kristien Roelens, Sofie Van Hoecke, and Thomas Demeester.

Ahmed et al. "A multivariate multiscale fuzzy entropy algorithm with application to uterine EMG complexity analysis"

Reported result

Based on MMFE features, an improvement in the classification accuracy of term-preterm deliveries was achieved, with a maximum area under the curve (AUC) value of 0.99.

Features

Then both MMFE (Fuzzy Entropy) and MMSE (Sample Entropy) analyses were performed on each one-min epoch (which had 60 × 20 = 1200 samples) and afterwards averaged over the 27 epochs to produce the MMFE or MMSE curves for each record. In this multiscale study, we considered 10 scales for each epoch, so that the coarse graining process of MMFE/MMSE analysis
yielded only 120 samples at the highest scale, which however was sufficient for MFSampEn calculation. These MSampEn or MFSampEn values calculated on 10 different coarse-graining scales were used as features in classification stage

Model

Guassian (??) SVM? Is this RBF?

Oversampling

In this study, to solve the class skew problem, the Adaptive Synthetic Sampling (ADASYN) [44,45] technique was used.

Comments

We have no Fuzzy Entropy, but I think it is enough to only consider Sample Entropy

FeaturesSadiAhmed: KeyError

There seems to be a bug in the FeaturesSadiAhmed:

When calling FeaturesAllEHG().extract(signal_ch3[3000:-3000]) for tpehg929 the following exception occurs:

Traceback (most recent call last):
  File "all_features.py", line 38, in <module>
    results_ch3 = fe.extract(signal_ch3[3000:-3000])
  File "/usr/local/lib/python3.6/dist-packages/ehgfeatures-0.0.1-py3.6.egg/ehgfeatures/features/_FeatureGroup.py", line 17, in extract
    results= {**results, **(f.extract(signal))}
  File "/usr/local/lib/python3.6/dist-packages/ehgfeatures-0.0.1-py3.6.egg/ehgfeatures/features/_FeaturesSadiAhmed.py", line 49, in extract
    emd= emds['emd_' + str(i)]
KeyError: 'emd_6'

Fergus et al. "Advanced artificial neural network classification for detecting preterm births using EHG records"

Reported result

The results illustrate that the combination of the Levenberg-Marquardt trained Feed-Forward Neural Network, Radial Basis Function Neural Network and the Random Neural Network classifiers performed the best, with 91% for sensitivity, 84% for specificity, 94% for the area under the curve and 12% for the mean error rate

Features

Screenshot from 2020-01-02 14-25-10

Model

Some weird flavors of neural nets...

Oversampling

To address this issue, the minority class (preterm) is oversampled using the Synthetic Minority Over-Sampling Technique (SMOTE).

Comments

I would not focus too much on the specifics of their neural net and just use a simple feed-forward network.

[NEW] Peng et al. "Evaluation of electrohysterogram measured from different gestational weeks for recognizing preterm delivery: a preliminary study using random Forest"

Reported result

After employing the adaptive synthetic sampling approach and six-fold cross-validation, the accuracy (ACC), sensitivity, specificity and area under the curve (AUC) were applied to evaluate RF classification. For PL and TL group, RF achieved the ACC of 0.93, sensitivity of 0.89, specificity of 0.97, and AUC of 0.80. Similarly, their corresponding values were 0.92, 0.88, 0.96 and 0.88 for PE and TE group, indicating that RF could be used to recognize preterm delivery effectively with EHG signals recorded before the 26th week of gestation.

Features

31 features per EHG recording:

  • RMS
  • Autocorrelation zero-crossing
  • Peak frequency
  • Median frequency
  • Mean frequency
  • Features from the wavelet decomposition mainly included the maximum, energy, singular and variance values
  • Features extracted from autoregressive (AR) model
  • Time reversibility
  • Lyapunov exponent
  • Sample entropy
  • Correlation dimension

Model

Random Forest

Oversampling

Adasyn

Comments

Seems like we have all features in place except for time reversibility... I implemented that one in a previous project:

def time_reversibility(data):
    norm = 1 / (len(data) - 1)
    lagged_data = data[1:]
    return norm * np.sum(np.power((lagged_data - data[:-1]), 3))

Consider general timeseries feature extraction packages (HCTSA & TSFRESH)

Additionally, we could also use TSFRESH and HCTSA to extract another few 1000 extra features. These features are very generic ones for timeseries (and thus maybe not suited for high-frequency biomedical signals), but could contain a few interesting ones. Of course, this would cause our feature elimination to be much more expensive...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.