I am using XGBoost for a sequence to sequence prediction task where the objective is to predict the next m hours of prices given n hours of historical prices.
When I use the LINEX loss with XGBoost for this time series forecasting problem, 100 % of the model's predicted values are negative, despite very few of the observations in the dataset being negative.
I suspect this is because the model learns that producing negative predictions leads to the lowest loss. That is, taking the exponent of negative values gives small values, and the exponent part of the gradient and hessian computations dominates the results of the computations. Below, we can see that underpredicting by 2 produces values that are much smaller than over-predicting by 2.
underpred = 8
overpred = 12
y = 10
a = 1.5
gradient_underpred = (2/a) * ( np.exp(a * (underpred-y)) - 1 ) # -1.2669505755095147
gradient_overpred = (2/a) * ( np.exp(a * (overpred-y)) - 1 ) # 25.447382564250223
hessian_underpred = 2 * np.exp(a * (underpred-y)) # 0.09957413673572789
hessian_overpred = 2 * np.exp(a * (overpred-y)) # 40.171073846375336
This is, of course, how LINEX is intended to work, but it seems does its job so well that predictions made with an XGBoost with the LINEX objective become useless.
The negative predictions might also be caused by something else that I haven't discovered.
Any suggestions for what my cause the problem and how to mitigate it?
Below is the full code except for the LINEX code as I copy-pasted your code.
Note that the data are np.arrays.
Shapes of data:
xgb_training_data_x: (37117, 72), i.e. 37117 instances of sequences that are 72 values long
xgb_training_data_y: (37117, 58), i.e. 37117 instances of sequences that are 58 values long
xgb_test_data_x: (4469, 72), i.e. 4469 instances of sequences that are 72 values long
a = 0.001
mode = "underestimate"
model_linex = xgb.XGBRegressor(
learning_rate =0.2,
n_estimators=10,#150,
max_depth=4,#8,
min_child_weight=1,
gamma=0.0,
subsample=0.8,
colsample_bytree=0.8,
scale_pos_weight=1,
seed=42,
objective=linex.get_linex_function(a=a, mode=mode)
)
model = xgb.XGBRegressor(
learning_rate =0.2,
n_estimators=10,#150,
max_depth=4,#8,
min_child_weight=1,
gamma=0.0,
subsample=0.8,
colsample_bytree=0.8,
scale_pos_weight=1,
seed=42
)
# The fit() procedure produces a warning:
# "UserWarning: Use subset (sliced data) of np.ndarray is not recommended"
# But the warning is expected behavior:
# https://github.com/dmlc/xgboost/issues/6908
# https://stackoverflow.com/questions/67225016/warning-occuring-in-xgboost
# It seems the data becomes sliced because of the MultiOutputRegressor wrapper
xgb_trained_linex = MultiOutputRegressor(model_linex).fit(xgb_training_data_x, xgb_training_data_y)
xgb_prediction_linex = xgb_trained_linex.predict(xgb_test_data_x)