Giter Club home page Giter Club logo

thymeboost's Introduction

ThymeBoost v0.1.16

Changes with 1.16:

New Trend Estimators: 'lbf' and 'decision_tree'. 'lbf' stands for linear basis functions and will fit a linear changepoint method with a similar vibe to that of Prophet. When this is boosted it gets much more smooth.

Documentation Status Downloads

alt text

ThymeBoost combines time series decomposition with gradient boosting to provide a flexible mix-and-match time series framework for forecasting. At the most granular level are the trend/level (going forward this is just referred to as 'trend') models, seasonal models, and endogenous models. These are used to approximate the respective components at each 'boosting round' and sequential rounds are fit on residuals in usual boosting fashion.

Documentation is under construction at : https://thymeboost.readthedocs.io/en/latest/

Basic flow of the algorithm:

alt text

Quick Start.

pip install ThymeBoost

Some basic examples:

Starting with a very simple example of a simple trend + seasonality + noise

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ThymeBoost import ThymeBoost as tb

sns.set_style('darkgrid')

#Here we will just create a random series with seasonality and a slight trend
seasonality = ((np.cos(np.arange(1, 101))*10 + 50))
np.random.seed(100)
true = np.linspace(-1, 1, 100)
noise = np.random.normal(0, 1, 100)
y = true + noise + seasonality
plt.plot(y)
plt.show()

alt text

First we will build the ThymeBoost model object:

boosted_model = tb.ThymeBoost(approximate_splits=True,
                              n_split_proposals=25,
                              verbose=1,
                              cost_penalty=.001)

The arguments passed here are also the defaults. Most importantly, we pass whether we want to use 'approximate splits' and how many splits to propose. If we pass approximate_splits=False then ThymeBoost will exhaustively try every data point to split on if we look for changepoints. If we don't care about changepoints then this is ignored.

ThymeBoost uses a standard fit => predict procedure. Let's use the fit method where everything passed is converted to a itertools cycle object in ThymeBoost, this will be referred as 'generator' parameters moving forward. This might not make sense yet but is shown further in the examples!

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='fourier',
                           seasonal_period=25,
                           split_cost='mse',
                           global_cost='maicc',
                           fit_type='global')

We pass the input time_series and the parameters used to fit. For ThymeBoost the more specific parameters are the different cost functions controlling for each split and the global cost function which controls how many boosting rounds to do. Additionally, the fit_type='global' designates that we are NOT looking for changepoints and just fits our trend_estimator globally.

With verbose ThymeBoost will print out some relevant information for us.

Now that we have fitted our series we can take a look at our results

boosted_model.plot_results(output)

alt text

The fit looks correct enough, but let's take a look at the indiviudal components we fitted.

boosted_model.plot_components(output)

alt text

Alright, the decomposition looks reasonable as well but let's complicate the task by now adding a changepoint.

Adding a changepoint

true = np.linspace(1, 50, 100)
noise = np.random.normal(0, 1, 100)
y = np.append(y, true + noise + seasonality)
plt.plot(y)
plt.show()

alt text

In order to fit this we will change fit_type='global' to fit_type='local'. Let's see what happens.

boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            n_split_proposals=25,
                            verbose=1,
                            cost_penalty=.001,
                            )

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='fourier',
                           seasonal_period=25,
                           split_cost='mse',
                           global_cost='maicc',
                           fit_type='local')
predicted_output = boosted_model.predict(output, 100)

Here we add in the predict method which takes in the fitted results as well as the forecast horizon. You will notice that the print out now states we are fitting locally and we do an additional round of boosting. Let's plot the results and see if the new round was ThymeBoost picking up the changepoint.

boosted_model.plot_results(output, predicted_output)

alt text

Ok, cool. Looks like it worked about as expected here, we did do 1 wasted round where ThymeBoost just did a slight adjustment at split 80 but that can be fixed as you will see!

Once again looking at the components:

boosted_model.plot_components(output)

alt text

There is a kink in the trend right around 100 as to be expected.

Let's further complicate this series.

Adding a large jump

#Pretty complicated model
true = np.linspace(1, 20, 100) + 100
noise = np.random.normal(0, 1, 100)
y = np.append(y, true + noise + seasonality)
plt.plot(y)
plt.show()

alt text

So here we have 3 distinct trend lines and one large shift upward. Overall, pretty nasty and automatically fitting this with any model (including ThymeBoost) can have extremely wonky results.

But...let's try anyway. Here we will utilize the 'generator' variables. As mentioned before, everything passed in to the fit method is a generator variable. This basically means that we can pass a list for a parameter and that list will be cycled through at each boosting round. So if we pass this: trend_estimator=['mean', 'linear'] after the initial trend estimation using the median we then use mean followed by linear then mean and linear until boosting is terminated. We can also use this to approximate a potential complex seasonality just by passing a list of what the complex seasonality can be. Let's fit with these generator variables and pay close attention to the print out as it will show you what ThymeBoost is doing at each round.

boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            verbose=1,
                            cost_penalty=.001,
                            )

output = boosted_model.fit(y,
                           trend_estimator=['mean'] + ['linear']*20,
                           seasonal_estimator='fourier',
                           seasonal_period=[25, 0],
                           split_cost='mae',
                           global_cost='maicc',
                           fit_type='local',
                           connectivity_constraint=True,
                           )

predicted_output = boosted_model.predict(output, 100)

The log tells us what we need to know:

********** Round 1 **********
Using Split: None
Fitting initial trend globally with trend model:
median()
seasonal model:
fourier(10, False)
cost: 2406.7734967780552
********** Round 2 **********
Using Split: 200
Fitting local with trend model:
mean()
seasonal model:
None
cost: 1613.03414289753
********** Round 3 **********
Using Split: 174
Fitting local with trend model:
linear((1, None))
seasonal model:
fourier(10, False)
cost: 1392.923553270366
********** Round 4 **********
Using Split: 274
Fitting local with trend model:
linear((1, None))
seasonal model:
None
cost: 1384.306737800115
==============================
Boosting Terminated 
Using round 4

The initial round for trend is always the same (this idea is pretty core to the boosting framework) but after that we fit with mean and the next 2 rounds are fit with linear estimation. The complex seasonality works 100% as we expect, just going back and forth between the 2 periods we give it where a 0 period means no seasonality estimation occurs.

Let's take a look at the results:

boosted_model.plot_results(output, predicted_output)

alt text

Hmmm, that looks very wonky.

But since we used a mean estimator we are saying that there is a change in the overall level of the series. That's not exactly true, by appending that last series with just another trend line we essentially changed the slope and the intercept of the series.

To account for this, let's relax connectivity constraints and just try linear estimators. Once again, EVERYTHING passed to the fit method is a generator variable so we will relax the connectivity constraint for the first linear fit to hopefully account for the large jump. After that we will use the constraint for 10 rounds then ThymeBoost will just cycle through the list we provide again.

#Without connectivity constraint
boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            verbose=1,
                            cost_penalty=.001,
                            )

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='fourier',
                           seasonal_period=[25, 0],
                           split_cost='mae',
                           global_cost='maicc',
                           fit_type='local',
                           connectivity_constraint=[False] + [True]*10,
                           )
predicted_output = boosted_model.predict(output, 100)
boosted_model.plot_results(output, predicted_output)

alt text

Alright, that looks a ton better. It does have some underfitting going on in the middle which is typical since we are using binary segmentation for the changepoints. But other than that it seems reasonable. Let's take a look at the components:

boosted_model.plot_components(output)

alt text

Looks like the model is catching on to the underlying process creating the data. The trend is clearly composed of three segments and has that large jump right at 200 just as we hoped to see!

Controlling the boosting rounds

We can control how many rounds and therefore the complexity of our model a couple of different ways. The most direct is by controlling the number of rounds.

#n_rounds=1
boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            verbose=1,
                            cost_penalty=.001,
                            n_rounds=1
                            )

output = boosted_model.fit(y,
                           trend_estimator='arima',
                           arima_order=[(1, 0, 0), (1, 0, 1), (1, 1, 1)],
                           seasonal_estimator='fourier',
                           seasonal_period=25,
                           split_cost='mae',
                           global_cost='maicc',
                           fit_type='global',
                           )
predicted_output = boosted_model.predict(output, 100)
boosted_model.plot_components(output)

alt text

By passing n_rounds=1 we only allow ThymeBoost to do the initial trend estimation (a simple median) and one shot at approximating the seasonality.

Additionally we are trying out a new trend_estimator along with the related parameter arima_order. Although we didn't get to it we are passing the arima_order to go from simple to complex.

Let's try forcing ThymeBoost to go through all of our provided ARIMA orders by setting n_rounds=4

boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            verbose=1,
                            cost_penalty=.001,
                            n_rounds=4,
                            regularization=1.2
                            )

output = boosted_model.fit(y,
                           trend_estimator='arima',
                           arima_order=[(1, 0, 0), (1, 0, 1), (1, 1, 1)],
                           seasonal_estimator='fourier',
                           seasonal_period=25,
                           split_cost='mae',
                           global_cost='maicc',
                           fit_type='global',
                           )
predicted_output = boosted_model.predict(output, 100)

Looking at the log:

********** Round 1 **********
Using Split: None
Fitting initial trend globally with trend model:
median()
seasonal model:
fourier(10, False)
cost: 2406.7734967780552
********** Round 2 **********
Using Split: None
Fitting global with trend model:
arima((1, 0, 0))
seasonal model:
fourier(10, False)
cost: 988.0694403606061
********** Round 3 **********
Using Split: None
Fitting global with trend model:
arima((1, 0, 1))
seasonal model:
fourier(10, False)
cost: 991.7292716360867
********** Round 4 **********
Using Split: None
Fitting global with trend model:
arima((1, 1, 1))
seasonal model:
fourier(10, False)
cost: 1180.688829140743

We can see that the cost which typically controls boosting is ignored. It actually increases in round 3. An alternative for boosting complexity would be to pass a larger regularization parameter when building the model class.

Component Regularization with a Learning Rate

Another idea taken from gradient boosting is the use of a learning rate. However, we allow component-specific learning rates. The main benefit to this is that it allows us to have the same fitting procedure (always trend => seasonality => exogenous) but account for the potential different ways we want to fit. For example, let's say our series is responding to an exogenous variable that is seasonal. Since we fit for seasonality BEFORE exogenous then we could eat up that signal. However, we could simply pass a seasonality_lr (or trend_lr / exogenous_lr) which will penalize the seasonality approximation and leave the signal for the exogenous component fit.

Here is a quick example, as always we could pass it as a list if we want to allow seasonality to return to normal after the first round.

#seasonality regularization
boosted_model = tb.ThymeBoost(
                            approximate_splits=True,
                            verbose=1,
                            cost_penalty=.001,
                            n_rounds=2
                            )

output = boosted_model.fit(y,
                           trend_estimator='arima',
                           arima_order=(1, 0, 1),
                           seasonal_estimator='fourier',
                           seasonal_period=25,
                           split_cost='mae',
                           global_cost='maicc',
                           fit_type='global',
                           seasonality_lr=.1
                           )
predicted_output = boosted_model.predict(output, 100)

Parameter Optimization

ThymeBoost has an optimizer which will try to find the 'optimal' parameter settings based on all combinations that are passed.

Importantly, all parameters that are normally pass to fit must now be passed as a list.

Let's take a look:

boosted_model = tb.ThymeBoost(
                           approximate_splits=True,
                           verbose=0,
                           cost_penalty=.001,
                           )

output = boosted_model.optimize(y, 
                                verbose=1,
                                lag=20,
                                optimization_steps=1,
                                trend_estimator=['mean', 'linear', ['mean', 'linear']],
                                seasonal_period=[0, 25],
                                fit_type=['local', 'global'])
100%|██████████| 12/12 [00:00<00:00, 46.63it/s]
Optimal model configuration: {'trend_estimator': 'linear', 'fit_type': 'local', 'seasonal_period': 25, 'exogenous': None}
Params ensembled: False

First off, I disabled the verbose call in the constructor so it won't print out everything for each model. Instead, passing verbose=1 to the optimize method will print a tqdm progress bar and the best model configuration. Lag refers to the number of points to holdout for our test set and optimization_steps allows you to roll through the holdout.

Another important thing to note, one of the elements in the list of trend_estimators is itself a list. With optimization, all we do is try each combination of the parameters given so each element in the list provided will be passed to the normal fit method, if that element is a list then that means you are using a generator variable for that implementation.

With the optimizer class we retain all other methods we have been using after fit.

predicted_output = boosted_model.predict(output, 100)

boosted_model.plot_results(output, predicted_output)

alt text

So this output looks wonky around that changepoint but it recovers in time to produce a good enough forecast to do well in the holdout.

Ensembling

Instead of iterating through and choosing the best parameters we could also just ensemble them into a simple average of every parameter setting.

Everything stated about the optimizer holds for ensemble as well, except now we just call the ensemble method.

boosted_model = tb.ThymeBoost(
                           approximate_splits=True,
                           verbose=0,
                           cost_penalty=.001,
                           )

output = boosted_model.ensemble(y, 
                                trend_estimator=['mean', 'linear', ['mean', 'linear']],
                                seasonal_period=[0, 25],
                                fit_type=['local', 'global'])

predicted_output = boosted_model.predict(output, 100)

boosted_model.plot_results(output, predicted_output)

alt text

Obviously, this output is quite wonky. Primarily because of the 'global' parameter which is pulling everything to the center of the data. However, ensembling has been shown to be quite effective in the wild.

Optimization with Ensembling?

So what if we want to try an ensemble out during optimization, is that possible?

The answer is yes!

But to do it we have to use a new function in our optimize method. Here is an example:

boosted_model = tb.ThymeBoost(
                           approximate_splits=True,
                           verbose=0,
                           cost_penalty=.001,
                           )

output = boosted_model.optimize(y, 
                                lag=10,
                                optimization_steps=1,
                                trend_estimator=['mean', boosted_model.combine(['ses', 'des', 'damped_des'])],
                                seasonal_period=[0, 25],
                                fit_type=['global'])

predicted_output = boosted_model.predict(output, 100)

For everything we want to be treated as an ensemble while optimizing we must wrap the parameter list in the combine function as seen: boosted_model.combine(['ses', 'des', 'damped_des'])

And now in the log:

Optimal model configuration: {'trend_estimator': ['ses', 'des', 'damped_des'], 'fit_type': ['global'], 'seasonal_period': [25], 'exogenous': [None]}
Params ensembled: True

We see that everything returned is a list and 'Params ensembled' is now True, signifying to ThymeBoost that this is an Ensemble.

Let's take a look at the outputs:

boosted_model.plot_results(output, predicted_output)

alt text

ToDo

The package is still under heavy development and with the large number of combinations that arise from the framework if you find any issues definitely raise them!

Logging and error handling is still basic to non-existent, so it is one of our top priorities.

thymeboost's People

Contributors

tblume1992 avatar

Stargazers

 avatar Dr Diween Hawezy avatar Josephbkt avatar  avatar Vahid Aryai avatar Bobo Jamson avatar Simon Müller avatar  avatar Michael Blaß avatar baggiponte avatar Liu Yong avatar  avatar Ferhat Çiçek avatar Sergio Oquendo avatar  avatar More Z. avatar  avatar Thierry AZALBERT avatar kj3 avatar Alexander März avatar  avatar Anas El Khaloui avatar  avatar  avatar Yan Budakyan avatar  avatar  avatar jope35 avatar Mutlu Simsek avatar  avatar Jakub Cierocki avatar  avatar Georg Svendsen avatar  avatar Max Mergenthaler avatar Thierry Jean avatar Mohit Burkule avatar Ayhan Ç. avatar Gerardo Carmona avatar Ümit Kaan Usta avatar Yue Li avatar Alex Steiner avatar Jason Young avatar Mark Aron Szulyovszky avatar  avatar _kp avatar Carlos Alejandro Perez Garcia avatar wentixiaogege avatar Hossein Abedi avatar  avatar  avatar Timesking avatar  avatar CALVI  avatar Mikołaj avatar Matteo Manzi avatar  avatar Silvio Lugaro avatar Yuantao Yang(Ayuan) avatar STYLIANOS IORDANIS avatar valeman avatar Kurucan avatar  avatar i-aztec avatar Oğuzhan Yaşar avatar Tianfu Wang avatar Tianxu Jia avatar Michael Shtelma avatar  avatar  avatar  avatar Guilherme Parreira avatar Derek Snow avatar  avatar Nico Müller avatar Florian Stracke avatar Shayan Davoodi avatar Stephen Parton avatar Makdoud avatar Rick Arko avatar Feng Tan avatar Andrew Bempah avatar  avatar Markus K. avatar Victor avatar Stan avatar Rizky Luthfianto avatar Brian Bae avatar Shreesha Kumar Bhat avatar Nandor_Nagy avatar Matthias Wiedemann avatar Manuel Chacón De Dios avatar Krzysztof Joachimiak avatar Douglas Braga avatar fpgaq avatar Jason Luo-WNI avatar  avatar Diogo avatar Alejandro Calle Saldarriaga avatar Tom Szumowski avatar

Watchers

James Cloos avatar  avatar Kostas Georgiou avatar  avatar

thymeboost's Issues

Multiple seasonal periods?

Hi,

Suppose I have a daily time serie with weekly, monthly and yearly seasonnalities.

Is it possible to mix the 3 seasonalities together ?

Because if we write seasonal_period=[7, 30, 365] in the boosted_model.optimize, it will choose the best of the 3... however, the best one is the combination of the 3 seasonalities!

Thanks!

Exogenous Features

Hi,

I don't know if it's a bug, but adding exogenous features seems not working:

image

image

Fitting works but predicting doesn't if dtype=object

If I make my data like this y = np.array(df.cust), I have an array with dtype=object.
array([2280, 2158, 2067, ..., 1696, 2035, 2083], dtype=object)

With this array I am able to fit the model but predicting throws me an error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_1721/2125226056.py in <module>
----> 1 predicted_output = boosted_model.predict(output, 100)
      2 boosted_model.plot_results(output, predicted_output, figsize = (18,12))

~/.local/lib/python3.8/site-packages/ThymeBoost/ThymeBoost.py in predict(self, fitted_output, forecast_horizon, future_exogenous, damp_factor, trend_cap_target)
    345             assert len(future_exogenous) == forecast_horizon, 'Given future exogenous not equal to forecast horizon'
    346         if self.ensemble_boosters is None:
--> 347             trend, seas, exo, predictions = predict_rounds(self.booster_obj,
    348                                                            forecast_horizon,
    349                                                            future_exogenous)

~/.local/lib/python3.8/site-packages/ThymeBoost/predict_functions.py in predict_rounds(booster_obj, forecast_horizon, future_exo)
    112                                            boosting_round,
    113                                            forecast_horizon)
--> 114         seasonal_predictions += predict_seasonality(booster_obj,
    115                                                     boosting_round,
    116                                                     forecast_horizon)

TypeError: ufunc 'add' output (typecode 'O') could not be coerced to provided output parameter (typecode 'd') according to the casting rule ''same_kind''

If I make my data like this y = np.array(df.cust, dtype=int), then dtype is not 'object' and both fitting and predicting work

question out of interest

Hello Tyler!

I hope you are fine. I was reading that the future of ARIMA is the SARIMAX model. As it is more dynamic. Is the reason why it is not used in your program, because it does not estimate the components by itself like auto.arima?

Thanks and greetings.

Error to train using autofit

It's my first time creating an issue so I'm sorry if I'm wrong about something :)

I have problems when using the autofit function, when executing these lines

boosted_model = tb.ThymeBoost(verbose=0)
output = boosted_model.autofit(al_train['value'], seasonal_period=288)
predicted_output = boosted_model.predict(output, len(al_test))
tb_mae = np.mean(np.abs(al_test - predicted_output['predictions']))

I get the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
[<ipython-input-10-c3af4c216539>](https://localhost:8080/#) in <module>()
      1 boosted_model = tb.ThymeBoost(verbose=0)
----> 2 output = boosted_model.autofit(al_train['value'], seasonal_period=288)
      3 predicted_output = boosted_model.predict(output, len(al_test))
      4 tb_mae = np.mean(np.abs(al_test - predicted_output['predictions']))
      5 tb_rmse = (np.mean((al_test - predicted_output['predictions'])**2))**.5

[/usr/local/lib/python3.7/dist-packages/ThymeBoost/ThymeBoost.py](https://localhost:8080/#) in autofit(self, time_series, seasonal_period, optimization_type, optimization_strategy, optimization_steps, lag, optimization_metric, test_set, verbose)
    578             seasonal_sample_weights = []
    579             weight = 1
--> 580             for i in range(len(y)):
    581                 if (i) % max_seasonal_pulse == 0:
    582                     weight += 1

NameError: name 'y' is not defined

In the previous version (0.1.10) if I could execute these lines

environment:

al_train:

time value
2022-05-28 00:00:00 3108860000.0
2022-05-28 00:05:00 3406160000.0
2022-05-28 00:10:00 3535540000.0
2022-05-28 00:15:00 3544810000.0
2022-05-28 00:20:00 3336570000.0
2022-05-28 00:25:00 2994020000.0
2022-05-28 00:30:00 3130380000.0
2022-05-28 00:35:00 2953710000.0

[...]

INFO al_train:

shape:  (840, 1)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 840 entries, 2022-05-28 00:00:00 to 2022-05-30 21:55:00
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   value   840 non-null    float64
dtypes: float64(1)
memory usage: 13.1 KB

AttributeError: 'NoneType' object has no attribute 'model_obj'

Hello Tyler, I am only opening this issue because you encourage people to report bugs.
Sometimes when I change the data I get:

#boosted_model.plot_components(output, predicted_output)

predicted_output = boosted_model.predict(output, 20, future_exogenous=X_test)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [18], in <cell line: 3>()
      1 #boosted_model.plot_components(output, predicted_output)
----> 3 predicted_output = boosted_model.predict(output, 20, future_exogenous=X_test)

File ~\anaconda3\lib\site-packages\ThymeBoost\ThymeBoost.py:376, in ThymeBoost.predict(self, fitted_output, forecast_horizon, future_exogenous, damp_factor, trend_cap_target, trend_penalty, uncertainty)
    374     self.scale_type = self._online_learning_ignore_params['scale_type']
    375 if self.ensemble_boosters is None:
--> 376     trend, seas, exo, predictions = predict_rounds(self.booster_obj,
    377                                                    forecast_horizon,
    378                                                    trend_penalty,
    379                                                    future_exogenous,
    380                                                    self.online_learning
    381                                                    )
    382     fitted_output = copy.deepcopy(fitted_output)
    383     predicted_output = self.builder.build_predicted_df(fitted_output,
    384                                                        forecast_horizon,
    385                                                        trend,
   (...)
    390                                                        damp_factor,
    391                                                        uncertainty)

File ~\anaconda3\lib\site-packages\ThymeBoost\predict_functions.py:135, in predict_rounds(booster_obj, forecast_horizon, trend_penalty, future_exo, online_learning)
    127     trend_predictions += predict_trend(booster_obj,
    128                                        boosting_round,
    129                                        forecast_horizon,
    130                                        trend_penalty,
    131                                        online_learning)
    132     seasonal_predictions += predict_seasonality(booster_obj,
    133                                                 boosting_round,
    134                                                 forecast_horizon)
--> 135     exo_predictions += predict_exogenous(booster_obj,
    136                                          future_exo,
    137                                          boosting_round,
    138                                          forecast_horizon)
    139 predictions = (trend_predictions +
    140                seasonal_predictions +
    141                exo_predictions)
    142 return trend_predictions, seasonal_predictions, exo_predictions, predictions

File ~\anaconda3\lib\site-packages\ThymeBoost\predict_functions.py:92, in predict_exogenous(booster_obj, future_exo, boosting_round, forecast_horizon)
     90     exo_round = np.zeros(forecast_horizon)
     91 else:
---> 92     exo_model = booster_obj.exo_objs[boosting_round].model_obj
     93     exo_round = exo_model.predict(future_exo)
     94     exo_round = exo_round * booster_obj.exo_class.exogenous_lr

AttributeError: 'NoneType' object has no attribute 'model_obj'

I don´t know if this is really something.

Best

Matthias

The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Hi,

I try to integrate Thymeboost into my automl framework e2eml.
However I struggle with the implemenation. I try to run Thymeboost in a time series cv split and optimize for seasonal_period param with Optuna.

However I run into the error "The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().".

  • ThymeBoost v0.1.10

The code I use:
You can find the current implementation here (available only in development branch):
https://github.com/ThomasMeissnerDS/e2e_ml/blob/develop/e2eml/time_series/time_series_models.py

It starts in line 185.

I tested on the Airpassenger dataset.

Short code example:

from e2eml.full_processing import postprocessing
from e2eml.time_series import time_series_blueprints as tsb
from e2eml.timetravel import timetravel

import pandas as pd
import re
import numpy as np
pd.set_option('display.max_colwidth', None)

full_df = pd.read_csv("AirPassengers.csv")
full_df

target = "#Passengers"
forecast_lead = 20
full_len = len(full_df.index)
time_df = full_df.head(full_len-forecast_lead)
holdout_df = full_df.tail(forecast_lead)

val_df_target = holdout_df[target] 
del holdout_df[target]

ts = tsb.TimeSeriesBluePrint(datasource=time_df,
                                       target_variable=target,
                                       preferred_training_mode='auto',
                                       ml_task='time_series',
                                       rapids_acceleration=False,
                                       cat_encoder_model='target')

ts.ml_bp106_univariate_timeseries_full_processing_thymeboost()

Can you tell me where the implementation has the hickup? I also appreciate every feedback on a better utilization of the library within e2e (maybe optimize for more than seasonal_periods etc)

New model idea

Hello Tyler! I see that you added nice examples that I want to try in the near future.
I was thinking about the following model idea. An ARIMA-boosted decision tree that does the smoothing for all components but leaves away the trend and seaosonality to keep it simple, and after this does the forecast with a regressor for every step of the multivariate forecast to avoid vanishing the predictive power.

What do you think about this idea? Could this be integrated or could you guide me with building this? Are you using reddit - what is your nickname there? So what I mean is, if you could be a bit of a tutor and teacher for me, as you have a lot of skills?

Greetings

Matthias

forecasting with future_exogenous values

I have a model specification like below:

future_exog = x_df[-1:]
boosted_model = tb.ThymeBoost(approximate_splits=True,
n_split_proposals=10,
verbose=1,
cost_penalty=.001)
output = boosted_model.fit(df['y'][:-1].values,
trend_estimator='linear',
seasonal_estimator='fourier',
seasonal_period=4,
split_cost='mae',
global_cost='maicc',
exogenous=x_df[:-1],
fit_type='global')
predicted_output = boosted_model.predict(output, forecast_horizon=1, future_exogenous=future_exog)

The model fit works fine with the exogenous variables but the predict function does not pick up the future_exogeneous variable values. The output dataframe looks like:

    y   exogenous        yhat  yhat_upper  yhat_lower  seasonality  \

13 298.0 11.974441 276.813790 394.821765 158.805814 14.543604
14 283.0 57.310657 302.120516 420.128492 184.112541 -5.855722
15 196.0 62.654007 310.999059 429.007035 192.991083 -2.690368

but the predicted_output looks like:

predictions predicted_trend predicted_seasonality predicted_exogenous
NaN 252.144933 -5.855722 NaN

predicted_upper  predicted_lower  
       NaN              NaN  

But the future_exogenous is not NaN.

Did I miss anything?

Type Error

After upgrading to the current version I get -


TypeError Traceback (most recent call last)
Input In [10], in <cell line: 10>()
1 boosted_model = tb.ThymeBoost(verbose=0)
2 output = boosted_model.fit(y_train,
3 trend_estimator=['linear','ses'],
4 seasonal_estimator='fourier',
(...)
8 exogenous=X_train,
9 )
---> 10 predicted_output = boosted_model.predict(output, 5, future_exogenous=X_test)

File ~\anaconda3\lib\site-packages\ThymeBoost\ThymeBoost.py:376, in ThymeBoost.predict(self, fitted_output, forecast_horizon, future_exogenous, damp_factor, trend_cap_target, trend_penalty, uncertainty)
374 self.scale_type = self._online_learning_ignore_params['scale_type']
375 if self.ensemble_boosters is None:
--> 376 trend, seas, exo, predictions = predict_rounds(self.booster_obj,
377 forecast_horizon,
378 trend_penalty,
379 future_exogenous,
380 self.online_learning
381 )
382 fitted_output = copy.deepcopy(fitted_output)
383 predicted_output = self.builder.build_predicted_df(fitted_output,
384 forecast_horizon,
385 trend,
(...)
390 damp_factor,
391 uncertainty)

File ~\anaconda3\lib\site-packages\ThymeBoost\predict_functions.py:135, in predict_rounds(booster_obj, forecast_horizon, trend_penalty, future_exo, online_learning)
127 trend_predictions += predict_trend(booster_obj,
128 boosting_round,
129 forecast_horizon,
130 trend_penalty,
131 online_learning)
132 seasonal_predictions += predict_seasonality(booster_obj,
133 boosting_round,
134 forecast_horizon)
--> 135 exo_predictions += predict_exogenous(booster_obj,
136 future_exo,
137 boosting_round,
138 forecast_horizon)
139 predictions = (trend_predictions +
140 seasonal_predictions +
141 exo_predictions)
142 return trend_predictions, seasonal_predictions, exo_predictions, predictions

TypeError: ufunc 'add' output (typecode 'O') could not be coerced to provided output parameter (typecode 'd') according to the casting rule ''same_kind''

Handling Outliers

Hey Tyler,

Loved the TowardDataScience article that you wrote. Reading through some of the examples you provided on TDS and also on the home page of the repo it seems as though much of the premise of this library is iterating on SARIMAX models with added abilities to handle outliers, level shifts, and variance changes using XGBoosting principles, but feel free to correct me if I'm wrong on that.

That lead me to a question. In the TDS blog you published much of the handling of outliers was focused on reducing the impact of that outlier in the seasonal projections of a seasonal recurring pattern. However, since SARIMAX models not only rely on the seasonal order but also on the non seasonal order, my question is, does ThymeBoost have an ability to control for the impact of an outlier on the non seasonal order and how that outlier can impact p, d, or q of an ARIMA(p,d,q) model? Whether it be automatically or through the use of a parameter similar to the seasonality_weights you mentioned in the article.

Keep up the great work really impressed how much you've done so far on your own!

Also thought you might found this article that I've been reading on Outliers, Level Shifts and Variance interesting.

The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

When using thymeboost.autofit method, I am using a column in a data frame as follows:
output = boosted_model.autofit(train[column], seasonal_period=[3,4,6,12]).
Then I get this error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

What is weird is that when I am using this method in Jupyter notebook, on python=3.7.16, thymeboost=0.1.15, it all works fine.
Although not ideal since other libraries that I am using need python=3.8 or higher.
However, as soon as I try to use it in VScode, python=3.10.11, thymeboost=0.1.15, it does not work anymore and I get this error. On python 3.10.11, I also tried previous thymeboost versions such as, 0.1.12, 0.1.13, but this did also not work.

The values are just regular integers. I have tried downgrading to previous versions of thymeboost but it still does not work.
What did work: I started a new venv, just like before, although this time with python=3.8, thymeboost=0.1.15, and it seems to work perfectly again. Maybe an issue with python 3.10.11?

yahoo finance

Hey Hello!

You said in Medium to add some examples with jupyter notebook.
Could you show a multivariate model that predicts stock prices with yahoo finance data and does out of sample prediction with the model?

Greetings M.

exogenous with fit_type = "local" does not work

Fitting with exogenous and changepoints doesn't seem to work

Code below works

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
np.random.seed(100)
trend = np.linspace(1, 50, 100) + 50
seasonality = ((np.cos(np.arange(1, 101))*10))
exogenous = np.random.randint(low=0, high=2, size=len(trend))
y = trend + seasonality + exogenous * 20

from ThymeBoost import ThymeBoost as tb
boosted_model = tb.ThymeBoost(verbose=1)

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='classic',
                           exogenous_estimator='ols',
                           seasonal_period=25,
                           global_cost='maicc',
                           fit_type='global',
                           exogenous=exogenous)

Using local doesn't

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='classic',
                           exogenous_estimator='ols',
                           seasonal_period=25,
                           global_cost='maicc',
                           fit_type='local',
                           exogenous=exogenous)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_2866/1890556654.py in <module>
     12 boosted_model = tb.ThymeBoost(verbose=1)
     13 
---> 14 output = boosted_model.fit(y,
     15                            trend_estimator='linear',
     16                            seasonal_estimator='classic',

~/.local/lib/python3.8/site-packages/ThymeBoost/ThymeBoost.py in fit(self, time_series, seasonal_period, trend_estimator, seasonal_estimator, exogenous_estimator, l2, poly, arima_order, connectivity_constraint, fourier_order, fit_type, window_size, trend_weights, seasonality_weights, trend_lr, seasonality_lr, exogenous_lr, min_sample_pct, split_cost, global_cost, exogenous, damp_factor, ewm_alpha, alpha, beta, ransac_trials, ransac_min_samples, tree_depth, additive)
    295                                    smoothed_trend=self.smoothed_trend,
    296                                    **_params)
--> 297         booster_results = self.booster_obj.boost()
    298         fitted_trend = booster_results[0]
    299         fitted_seasonality = booster_results[1]

~/.local/lib/python3.8/site-packages/ThymeBoost/fitter/booster.py in boost(self)
     85             round_results = self.additive_boost_round(self.i)
     86             current_prediction, total_trend, total_seasonal, total_exo = round_results
---> 87             self.c = get_complexity(self.i,
     88                                     self.boosting_params['poly'],
     89                                     self.boosting_params['fit_type'],

~/.local/lib/python3.8/site-packages/ThymeBoost/utils/get_complexity.py in get_complexity(boosting_round, poly, fit_type, trend_estimator, arima_order, window_size, time_series, fourier_order, seasonal_period, exogenous)
     29             c = poly + fourier_order + boosting_round
     30         if exogenous is not None:
---> 31             c += np.shape(exogenous)[1]
     32     return c

IndexError: tuple index out of range

If I remove exogenous then fit_type="local" works

output = boosted_model.fit(y,
                           trend_estimator='linear',
                           seasonal_estimator='classic',
                           exogenous_estimator='ols',
                           seasonal_period=25,
                           global_cost='maicc',
                           fit_type='local')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.