Hi, Firstly - thank you for your time, work and commitment that went into this pac

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Sure. Calculating variance when most of the values are constan

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

Ok - so few questions: Why the normalization would not be stab

Prediction on unseen data about pytorch-forecasting HOT 10 CLOSED

jdb78 commented on May 18, 2024

Prediction on unseen data

from pytorch-forecasting.

Comments (10)

AlexMRuch commented on May 18, 2024 1

Hi @randomgitdude, check out #67 and the updated tutorials on https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb,

from pytorch-forecasting.

jdb78 commented on May 18, 2024 1

Sure.

Calculating variance when most of the values are constant is likely to be difficult (e.g. price is mostly constant). You can imagine the normalization vastly changing by just moving a few timesteps. This prevents learning useful information.
NNs have troubles with outputting unnormalised numbers. It is possible but you start to get issues because all the non-linearities are built for values between -2 and 2. Further, normalisation makes values across timeseries comparable, hence facilitating transfer learning.
Yes, because it means that you copy the pre-processors from training to abc.

Hope this is helpful.

from pytorch-forecasting.

randomgitdude commented on May 18, 2024

@AlexMRuch Thank you for pointing out to that updated tutorial. However I have few questions:

encoder_data = data[lambda x: x.time_idx > x.time_idx.max() - max_encoder_length]
This indeed creates a new data set - but what about scaling these features ? AFAIC only TimeSeriesDataSet class does that - and in the case of not calling the class upfront on the new datasets we are feeding the NN with unseen datasets "as-is" without any per-processing that it was initially trained with.
I manged to the get predictions with the following steps:
-> Loading the new data
-> Standard pre-processing (assiging categorical variables)
-> Assigning a date index - the same way as for the training set
date_map = dict( zip( pd.bdate_range( "2006-01-01", "2099-01-01", ), np.arange( 0, len( pd.bdate_range( "2006-01-01", "2030-01-01", ) ), ) + 1, ) ) df_valid["Date_Time"] = df_valid["Date_Time"].copy().map(date_map).astype(int)
-> Creating abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True)
-> Then testing_sample = abc.to_dataloader(train=False)
-> Finally raw_predictions = best_tft.predict(testing_sample )

But I wonder if this is the correct approach or am I missing something ?

from pytorch-forecasting.

AlexMRuch commented on May 18, 2024

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on? If you mean the latter, I'm not sure as my time series skills are not that great and @jdb78 may have more thoughts. For historical data, I think you can still do all the scaling you need on the data DataFrame before this step, as the lambda is only slicing off a section of the DataFrame.

For point 2, I'm glad to hear you got the predictions working. I implemented the forecasting methods just as @jdb78 did in the tutorial and the results have face validity with what I'd expect (and they do differ from the evaluation plots), so that's about all I can speak to the question of whether the approach is correct.

from pytorch-forecasting.

randomgitdude commented on May 18, 2024

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on?

Future target data.

For historical data, I think you can still do all the scaling you need on the data DataFrame before this step,

I was referring to future data. For the historical data - the data is actually scaled in the training class. Thus my assumption is that it should be used to scale future data. Why ? Well, simply because you have to scale the future values according to the mean and std of the training data not according to the mean or std of the future values.

As for point no. 2 maybe @jdb78 can elaborate ?

from pytorch-forecasting.

AlexMRuch commented on May 18, 2024

Ah, yeah, I definitely see your point and am curious to know what is best-practice as well! Thanks for clarifying!

from pytorch-forecasting.

jdb78 commented on May 18, 2024

Issue #51 sheds some light on it. There are basically two approaches. Both are implemented in PyTorch Forecasting.

from pytorch-forecasting.

randomgitdude commented on May 18, 2024

The first option is off the table for various reasons - at least IMHO.
Now, the second - so the EncoderNormalizer inherits from pytorch_forecasting.data.encoders.TorchNormalizer which inherits from the standard sklearn's sklearn.base.BaseEstimator, sklearn.base.TransformerMixin. So far so good - but in that case should I use it in the TimeSeriesDataSet class or before feeding it into it ? Becasuse if I get this correctly TimeSeriesDataSet only normalizes the target with the EncoderNormalizer.

from pytorch-forecasting.

jdb78 commented on May 18, 2024

In practice there should be minimal leakage by normalising on the entire training set instead of the encoder if the variable in question is not the target. Probably, normalising something else on the encoder sequence only would not work because the normalisation would not be stable. If you want to contribute this feature, feel invited to raise a PR!

from pytorch-forecasting.

randomgitdude commented on May 18, 2024

Ok - so few questions:

Why the normalization would not be stable ?
In the give case - does normalizing the target yields any benefit ?
Is abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True) a viable way of per-processing an unseen datasets ?

from pytorch-forecasting.

Prediction on unseen data about pytorch-forecasting HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent