Comments (10)
Hi @randomgitdude, check out #67 and the updated tutorials on https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb,
from pytorch-forecasting.
Sure.
- Calculating variance when most of the values are constant is likely to be difficult (e.g. price is mostly constant). You can imagine the normalization vastly changing by just moving a few timesteps. This prevents learning useful information.
- NNs have troubles with outputting unnormalised numbers. It is possible but you start to get issues because all the non-linearities are built for values between -2 and 2. Further, normalisation makes values across timeseries comparable, hence facilitating transfer learning.
- Yes, because it means that you copy the pre-processors from
training
toabc
.
Hope this is helpful.
from pytorch-forecasting.
@AlexMRuch Thank you for pointing out to that updated tutorial. However I have few questions:
-
encoder_data = data[lambda x: x.time_idx > x.time_idx.max() - max_encoder_length]
This indeed creates a new data set - but what about scaling these features ? AFAIC only TimeSeriesDataSet class does that - and in the case of not calling the class upfront on the new datasets we are feeding the NN with unseen datasets "as-is" without any per-processing that it was initially trained with. -
I manged to the get predictions with the following steps:
-> Loading the new data
-> Standard pre-processing (assiging categorical variables)
-> Assigning a date index - the same way as for the training set
date_map = dict( zip( pd.bdate_range( "2006-01-01", "2099-01-01", ), np.arange( 0, len( pd.bdate_range( "2006-01-01", "2030-01-01", ) ), ) + 1, ) ) df_valid["Date_Time"] = df_valid["Date_Time"].copy().map(date_map).astype(int)
-> Creatingabc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True)
-> Thentesting_sample = abc.to_dataloader(train=False)
-> Finallyraw_predictions = best_tft.predict(testing_sample )
But I wonder if this is the correct approach or am I missing something ?
from pytorch-forecasting.
For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on? If you mean the latter, I'm not sure as my time series skills are not that great and @jdb78 may have more thoughts. For historical data, I think you can still do all the scaling you need on the data
DataFrame before this step, as the lambda is only slicing off a section of the DataFrame.
For point 2, I'm glad to hear you got the predictions working. I implemented the forecasting methods just as @jdb78 did in the tutorial and the results have face validity with what I'd expect (and they do differ from the evaluation plots), so that's about all I can speak to the question of whether the approach is correct.
from pytorch-forecasting.
For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on?
Future target data.
For historical data, I think you can still do all the scaling you need on the data DataFrame before this step,
I was referring to future data. For the historical data - the data is actually scaled in the training class. Thus my assumption is that it should be used to scale future data. Why ? Well, simply because you have to scale the future values according to the mean and std of the training data not according to the mean or std of the future values.
As for point no. 2 maybe @jdb78 can elaborate ?
from pytorch-forecasting.
Ah, yeah, I definitely see your point and am curious to know what is best-practice as well! Thanks for clarifying!
from pytorch-forecasting.
Issue #51 sheds some light on it. There are basically two approaches. Both are implemented in PyTorch Forecasting.
from pytorch-forecasting.
The first option is off the table for various reasons - at least IMHO.
Now, the second - so the EncoderNormalizer inherits from pytorch_forecasting.data.encoders.TorchNormalizer which inherits from the standard sklearn's sklearn.base.BaseEstimator, sklearn.base.TransformerMixin. So far so good - but in that case should I use it in the TimeSeriesDataSet class or before feeding it into it ? Becasuse if I get this correctly TimeSeriesDataSet only normalizes the target with the EncoderNormalizer.
from pytorch-forecasting.
In practice there should be minimal leakage by normalising on the entire training set instead of the encoder if the variable in question is not the target. Probably, normalising something else on the encoder sequence only would not work because the normalisation would not be stable. If you want to contribute this feature, feel invited to raise a PR!
from pytorch-forecasting.
Ok - so few questions:
- Why the normalization would not be stable ?
- In the give case - does normalizing the target yields any benefit ?
- Is
abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True)
a viable way of per-processing an unseen datasets ?
from pytorch-forecasting.
Related Issues (20)
- Is the DeepVAR implementation correct? HOT 1
- Issue while installing pytorch-forecasting through pip HOT 4
- Support for state of the art TSMixer model HOT 1
- Why does NHiTS need the target variable specified in the time_varying_unknown_reals attribute? HOT 1
- Odd values for VAL_MAE & VAL_RMSE for TFT
- RuntimeError on variable Validation Batch Sizes in TemporalFusionTransformer Tutorial HOT 6
- Symbol not found: __ZN3c106detail19maybe_wrap_dim_slowIxEET_S2_S2_b
- How to solve: OSError: [WinError 127] 找不到指定的程序。
- How to set up a training process that can take forecasts from other models
- Issue with TFT.forward() method
- Interpreting DeepAR.predict() HOT 1
- Data leakage problem HOT 2
- Bug of SMAPE when excuting tutorial
- RMSE defined as MSE HOT 1
- rubbish
- Val_loss Calculation with Multiple Validation Sets in Sliding Window Technique and Early Stopping
- How does one initialise a network without `from_dataset`?
- DeepAR with NormalDistributionLoss error on 0 values in target
- How to split the data into train, test, and validation HOT 1
- multi horizon timeseries time_idx
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-forecasting.