Giter Club home page Giter Club logo

Comments (9)

tblume1992 avatar tblume1992 commented on August 28, 2024

Looks like in your implementation you use the autofit method. This method will do some optimization with time series cross validation for you.

What I imagine is happening is that you are passing a data split which then gets split by ThymeBoost and at some point creates an empty df. This is because the default settings for autofit is:

autofit(    time_series,
                 seasonal_period=[0],
                 optimization_type='grid_search',
                 optimization_strategy='cv',
                 optimization_steps=3,
                 lag=2,
                 optimization_metric='mse',
                 test_set='all',
                 verbose=1)

Which means that you will do 3 rounds of a rolling forecast with a test set size of 2. But you are already creating that split with the TSCV class!

You have a few options:

  1. You could cut out the TSCV piece and just leverage autofit for the whole process, but then you will lose most benefits of optuna.
  2. You use the simple fit method and use optuna to test out a bunch of the trend estimators and other params.

For what that space could look like, this is what is searched in autofit:

param_dict = {'trend_estimator': ['linear',
                                          ['linear', 'ses'],
                                          'ses',
                                          'arima',
                                          ThymeBoost.combine(['ses', 'des', 'damped_des'])],
                       'arima_order': ['auto'],
                      'seasonal_estimator': ['fourier'],
                      'seasonal_period': [{your seasonal period}, 0],
                      'fit_type': ['global'],
                      'global_cost': ['mse'],
                      'additive': {if contains data zero then False, else [True, False]}
                      }

That should get you started (I added in remarks for parameters that should be changed depending on the data). Then the only other major thing you could add later is trying out fit_type = 'local' which will look for changepoints.

Either way, if the error persists then definitely send over a full traceback and what the input into ThymeBoost is!

from thymeboost.

tblume1992 avatar tblume1992 commented on August 28, 2024

Just to following up here, I made a quick gist of optimizing with hyperopt. The patterns should be similar as with optuna I would imagine. Let me know if the issue is resolved!

from thymeboost.

tblume1992 avatar tblume1992 commented on August 28, 2024

Going to close this issue, feel free to reopen if the problem persists!

from thymeboost.

ThomasMeissnerDS avatar ThomasMeissnerDS commented on August 28, 2024

Hi,

apologies for coming back so late.
I tried the autofit method.

  • when passing a dataframe it threw an error as it expects one-dimensional data
  • then I converted the data via values.ravel()

So running:

param = {
                    "seasonal_period": trial.suggest_int("seasonal_period", 2, 24),
                }
                model = tb.ThymeBoost(verbose=0)
                model.autofit(
                    X_train.values.ravel(), seasonal_period=param["seasonal_period"]
                )

throws another error. The error message is:

[W 2022-08-01 22:16:58,659] Trial 0 failed because of the following error: NameError("name 'y' is not defined")
Traceback (most recent call last):
File "/home/thomas/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/_optimize.py", line 208, in _run_trial
value_or_values = func(trial)
File "/home/thomas/IdeaProjects/e2e_ml/e2eml/time_series/time_series_models.py", line 197, in objective
model.autofit(
File "/home/thomas/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/ThymeBoost/ThymeBoost.py", line 580, in autofit
for i in range(len(y)):
NameError: name 'y' is not defined


NameError Traceback (most recent call last)
/tmp/ipykernel_184043/3800967605.py in
----> 1 ts.ml_bp106_univariate_timeseries_full_processing_thymeboost()

~/IdeaProjects/e2e_ml/e2eml/time_series/time_series_blueprints.py in ml_bp106_univariate_timeseries_full_processing_thymeboost(self, df, n_forecast)
203 pass
204 else:
--> 205 self.thymeboost_train()
206 algorithm = "thymeboost"
207 self.thymeboost_predict(n_forecast=n_forecast)

~/IdeaProjects/e2e_ml/e2eml/time_series/time_series_models.py in thymeboost_train(self)
215 direction="minimize", sampler=sampler, study_name=f"{algorithm}"
216 )
--> 217 study.optimize(
218 objective,
219 n_trials=self.hyperparameter_tuning_rounds[algorithm],

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/study.py in optimize(self, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
404 """
405
--> 406 _optimize(
407 study=self,
408 func=func,

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize(study, func, n_trials, timeout, n_jobs, catch, callbacks, gc_after_trial, show_progress_bar)
66 try:
67 if n_jobs == 1:
---> 68 _optimize_sequential(
69 study,
70 func,

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/_optimize.py in _optimize_sequential(study, func, n_trials, timeout, catch, callbacks, gc_after_trial, reseed_sampler_rng, time_start, progress_bar)
160
161 try:
--> 162 frozen_trial = _run_trial(study, func, catch)
163 except Exception:
164 raise

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
250 and not isinstance(func_err, catch)
251 ):
--> 252 raise func_err
253 return frozen_trial
254

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/optuna/study/_optimize.py in _run_trial(study, func, catch)
206
207 try:
--> 208 value_or_values = func(trial)
209 except exceptions.TrialPruned as e:
210 # TODO(mamu): Handle multi-objective cases.

~/IdeaProjects/e2e_ml/e2eml/time_series/time_series_models.py in objective(trial)
195 }
196 model = tb.ThymeBoost(verbose=0)
--> 197 model.autofit(
198 X_train.values.ravel(), seasonal_period=[param["seasonal_period"]]
199 )

~/anaconda3/envs/rapids-21.12/lib/python3.8/site-packages/ThymeBoost/ThymeBoost.py in autofit(self, time_series, seasonal_period, optimization_type, optimization_strategy, optimization_steps, lag, optimization_metric, test_set, verbose)
578 seasonal_sample_weights = []
579 weight = 1
--> 580 for i in range(len(y)):
581 if (i) % max_seasonal_pulse == 0:
582 weight += 1

NameError: name 'y' is not defined

I don't understand how to get around this

from thymeboost.

tblume1992 avatar tblume1992 commented on August 28, 2024

No worries!
ThymeBoost expects either a Series or a Numpy array as the input for the time series.

As for the other issue that was my on my side, made a push with a bug in autofit. I would upgrade to 0.1.13 and see if that resolves this.

from thymeboost.

ThomasMeissnerDS avatar ThomasMeissnerDS commented on August 28, 2024

Hi,

the new version fixed the previous bug, but I have a new issue now.

I use the autofit method and use optuna only to tune seasonal_period param. Autofit itself runs fine, but when I want to predict I get an error:

y_true and y_pred have different number of output (1!=6)

The output dataframe (from autofit) looks like this:

Params ensembled: False
        y        yhat  yhat_upper  yhat_lower  seasonality       trend
0   112.0  181.255541  276.476494   86.034589     0.883303  205.201995
1   118.0  121.793098  217.014051   26.572146     0.986241  123.492209
2   132.0  117.843000  213.063953   22.622048     0.942961  124.971248
3   129.0  143.371477  238.592430   48.150524     1.008141  142.213654
4   121.0  128.901463  224.122415   33.680510     1.028445  125.336236
..    ...         ...         ...         ...          ...         ...
94  271.0  307.218213  402.439166  211.997261     1.011550  303.710285
95  306.0  287.095010  382.315963  191.874058     1.009355  284.434119
96  315.0  347.107045  442.327998  251.886093     1.053563  329.460218
97  301.0  294.137285  389.358238  198.916333     0.989435  297.278016
98  356.0  342.813006  438.033959  247.592054     1.048039  327.099487

[99 rows x 6 columns]

The function is called like this:
`X_train, X_test, Y_train, Y_test = self.unpack_test_train_dict()
algorithm = "thymeboost"

        def objective(trial):
            param = {
                "seasonal_period": trial.suggest_int("seasonal_period", 2, 24),
            }
            model = tb.ThymeBoost(verbose=0)
            model.autofit(
                X_train.values.ravel(), seasonal_period=param["seasonal_period"]
            )
            try:
                output = model.autofit(
                    X_train.values.ravel(), seasonal_period=param["seasonal_period"]
                )
                print(output)
                preds = model.predict(output, forecast_horizon=len(X_test.index))
                mae = mean_absolute_error(Y_test, preds)
            except Exception as e:
                mae = 9999999999
                print(e)
            return mae`

from thymeboost.

tblume1992 avatar tblume1992 commented on August 28, 2024

This error is coming from sklearn's mean_absolute_error because the output from the predict method is a DataFrame. So you just need to grab it from there:

prediction_output = model.predict(output, forecast_horizon=len(X_test.index))
preds = prediction_output['predictions']
mae = mean_absolute_error(Y_test, preds)
#or in case index is a potential problem
mae = mean_absolute_error(Y_test, preds.values)

Besides that, there isn't much for optuna to do here with autofit. Autofit will try both seasonal and nonseasonal settings so trying that again with optuna is probably a waste of computations. I guess what you could do is mess with the optimization_strategy and lag and optimization_steps since right now they will be default which is probably not the best for general use cases.

Let me know if this fixes it!

from thymeboost.

ThomasMeissnerDS avatar ThomasMeissnerDS commented on August 28, 2024

Thank you. It works now.

from thymeboost.

tblume1992 avatar tblume1992 commented on August 28, 2024

Great, let me know if you have anymore questions!

from thymeboost.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.