I tried to predict the RUL values for the 100 trajectories in the FD003 dataset from Turbofan Engine Degradation Simulation Data Set using two different models (neural network and support vector machine).
My code can be found in the turbofan.ipynb
file.
I computed the RUL value for each row in the training dataset and simplified the model by treating each sample as an independent observations.
I used the formula (3) in [1] to normalize each feature and then selected only features with standarad deviation > 0 to incude in the prediton. The transformed training dataset contains 24720 samples and 20 features.
The first model is a neural network implemented using DNNRegressor
from the TensorFlow library. After some experimenting, I decided to use a network with 3 hidden layers containing 15, 30, and 15 units.
Use of a Support vector machine (SVM) model is suggested in [2]. The authors recommend to use non-linear radial basis (RBF) function. SVM model is provided in the Scikit library. I used the default setting with the RBF kernel function.
I chose three different metrics to assess the prediction for each model. Mean square error (MSE), median of absolute differences between the real and predicted RULs, and the Score in [3]. I modified the Score formula (11) in [3] by dividing the overall value by the number of testing samples.
Metric | Neural Network | SVM |
---|---|---|
MSE | 4.71 | 3.54 |
Median absolute difference | 24.88 | 22.96 |
Score | 9.15e8 | 677.2 |
All three metrics show the SVM as the better performing model. The large difference in the Score is caused by the neural network model "overshooting" (the predicted RUL is greater than the real value) which the Score function heavily penalizes [3].