Giter Club home page Giter Club logo

Comments (10)

AlexMRuch avatar AlexMRuch commented on May 18, 2024 1

Hi @randomgitdude, check out #67 and the updated tutorials on https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb,

from pytorch-forecasting.

jdb78 avatar jdb78 commented on May 18, 2024 1

Sure.

  1. Calculating variance when most of the values are constant is likely to be difficult (e.g. price is mostly constant). You can imagine the normalization vastly changing by just moving a few timesteps. This prevents learning useful information.
  2. NNs have troubles with outputting unnormalised numbers. It is possible but you start to get issues because all the non-linearities are built for values between -2 and 2. Further, normalisation makes values across timeseries comparable, hence facilitating transfer learning.
  3. Yes, because it means that you copy the pre-processors from training to abc.

Hope this is helpful.

from pytorch-forecasting.

randomgitdude avatar randomgitdude commented on May 18, 2024

@AlexMRuch Thank you for pointing out to that updated tutorial. However I have few questions:

  1. encoder_data = data[lambda x: x.time_idx > x.time_idx.max() - max_encoder_length]
    This indeed creates a new data set - but what about scaling these features ? AFAIC only TimeSeriesDataSet class does that - and in the case of not calling the class upfront on the new datasets we are feeding the NN with unseen datasets "as-is" without any per-processing that it was initially trained with.

  2. I manged to the get predictions with the following steps:
    -> Loading the new data
    -> Standard pre-processing (assiging categorical variables)
    -> Assigning a date index - the same way as for the training set
    date_map = dict( zip( pd.bdate_range( "2006-01-01", "2099-01-01", ), np.arange( 0, len( pd.bdate_range( "2006-01-01", "2030-01-01", ) ), ) + 1, ) ) df_valid["Date_Time"] = df_valid["Date_Time"].copy().map(date_map).astype(int)
    -> Creating abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True)
    -> Then testing_sample = abc.to_dataloader(train=False)
    -> Finally raw_predictions = best_tft.predict(testing_sample )

But I wonder if this is the correct approach or am I missing something ?

from pytorch-forecasting.

AlexMRuch avatar AlexMRuch commented on May 18, 2024

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on? If you mean the latter, I'm not sure as my time series skills are not that great and @jdb78 may have more thoughts. For historical data, I think you can still do all the scaling you need on the data DataFrame before this step, as the lambda is only slicing off a section of the DataFrame.

For point 2, I'm glad to hear you got the predictions working. I implemented the forecasting methods just as @jdb78 did in the tutorial and the results have face validity with what I'd expect (and they do differ from the evaluation plots), so that's about all I can speak to the question of whether the approach is correct.

from pytorch-forecasting.

randomgitdude avatar randomgitdude commented on May 18, 2024

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on?

Future target data.

For historical data, I think you can still do all the scaling you need on the data DataFrame before this step,

I was referring to future data. For the historical data - the data is actually scaled in the training class. Thus my assumption is that it should be used to scale future data. Why ? Well, simply because you have to scale the future values according to the mean and std of the training data not according to the mean or std of the future values.

As for point no. 2 maybe @jdb78 can elaborate ?

from pytorch-forecasting.

AlexMRuch avatar AlexMRuch commented on May 18, 2024

Ah, yeah, I definitely see your point and am curious to know what is best-practice as well! Thanks for clarifying!

from pytorch-forecasting.

jdb78 avatar jdb78 commented on May 18, 2024

Issue #51 sheds some light on it. There are basically two approaches. Both are implemented in PyTorch Forecasting.

from pytorch-forecasting.

randomgitdude avatar randomgitdude commented on May 18, 2024

The first option is off the table for various reasons - at least IMHO.
Now, the second - so the EncoderNormalizer inherits from pytorch_forecasting.data.encoders.TorchNormalizer which inherits from the standard sklearn's sklearn.base.BaseEstimator, sklearn.base.TransformerMixin. So far so good - but in that case should I use it in the TimeSeriesDataSet class or before feeding it into it ? Becasuse if I get this correctly TimeSeriesDataSet only normalizes the target with the EncoderNormalizer.

from pytorch-forecasting.

jdb78 avatar jdb78 commented on May 18, 2024

In practice there should be minimal leakage by normalising on the entire training set instead of the encoder if the variable in question is not the target. Probably, normalising something else on the encoder sequence only would not work because the normalisation would not be stable. If you want to contribute this feature, feel invited to raise a PR!

from pytorch-forecasting.

randomgitdude avatar randomgitdude commented on May 18, 2024

Ok - so few questions:

  1. Why the normalization would not be stable ?
  2. In the give case - does normalizing the target yields any benefit ?
  3. Is abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True) a viable way of per-processing an unseen datasets ?

from pytorch-forecasting.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.