jdb78 / pytorch-forecasting Goto Github PK
View Code? Open in Web Editor NEWTime series forecasting with PyTorch
Home Page: https://pytorch-forecasting.readthedocs.io/
License: MIT License
Time series forecasting with PyTorch
Home Page: https://pytorch-forecasting.readthedocs.io/
License: MIT License
Hi,
I have a question on the TimeSeriesDataSet class. In my case, I have a time series sequence with a sampling frequency of 15 minutes, and one target variable. Unfortunately, the sequence is extensively interrupted (too extensively to impute the np.nan's with values) at several points in the sequence; e.g. it looks like:
target | input_0 | |
---|---|---|
2004-10-31 23:00:00+00:00 | 90 | 3.3 |
2004-10-31 23:15:00+00:00 | 91.5 | 3.3 |
..... | 91.5 | 3.3 |
2004-11-30 23:15:00+00:00 | 91.5 | 3.3 |
2004-11-30 23:30:00+00:00 | np.nan | np.nan |
.... | np.nan | np.nan |
2004-12-01 23:30:00+00:00 | 89.5 | 3.25 |
2004-12-01 23:45:00+00:00 | 86 | 3.2 |
How to handle this? From the documentation, I have understood that I could handle this by assigning to each uninterrupted chunk of the time series a different 'timeseries' number:
target | input_0 | timeseries | |
---|---|---|---|
2004-10-31 23:00:00+00:00 | 90 | 3.3 | 0 |
2004-10-31 23:15:00+00:00 | 91.5 | 3.3 | 0 |
..... | 91.5 | 3.3 | 0 |
2004-11-30 23:15:00+00:00 | 91.5 | 3.3 | 0 |
2004-12-01 23:30:00+00:00 | 89.5 | 3.25 | 1 |
2004-12-01 23:45:00+00:00 | 86 | 3.2 | 1 |
Is this correct?
In addition, I understand that to make predictions for 6 points in the future based on 24 points in the past I need to set max_prediction_length = 6
and max_encoder_length = 24
(is this correct?). However, suppose that I want to predict 6 points in the future, with a frequency of 30 minutes; using 24 points in the past with a different sampling frequency (say an hour). How to achieve this? Should I make one dataframe with many 'time series', where each time series only has 30 points (the 24 first points the input sampled every hour and the final 6 points to predict which are sampled every 30 minutes)? Would that work?
Many thanks in any case!
Tomas
The AR example has a bestmodel with an address to a personal computer, this bestmodel is not provided and cause errors when running the notebook.
Presently, the hyperparameter optimization returns the best parameters for
{'gradient_clip_val': 1.028043566346387,
'hidden_size': 176,
'dropout': 0.22095396475678628,
'hidden_continuous_size': 36,
'attention_head_size': 1}
even when use_learning_rate_finder=True
. I expect this could be fixed easily just by adding it to the optuna parameters to track.
First of all, this library is fantastic- thanks so much for the work!
I am attempting to train TFT on a very large time series dataset, but am finding working with the TimeSeriesDataSet fairly confusing with the current documentation; it would be great if anyone can clarify the following:
Should group_ids identify samples of time series or whole time series? If i have some time series (call it 'PRICE') with a length of 1,000 but a sample length of 100 for my model, should I create e.g. 10 group identifiers ('PRICE0-PRICE9') to specify training examples, or should I keep as single series?
How are training examples sampled from a TimeSeriesDataSet? Assuming group_ids identify whole time series, how are training examples sampled from a series? Are random blocks taken? Are they taken in order? This is not very clear.
Are samples aligned by time_idx? If I have multiple time series which should be aligned (e.g. prices of multiple securities), will they be aligned by time_idx when sampled? If so, what happens when points are missing?
How are validation examples sampled?
Is there support for very large datasets? The dataset I am working with is >100GB of time series data; loading it all into a DataFrame isn't possible. Is there support for building custom DataLoaders which can load batches at train time?
I would be happy to update documentation if these are answered, thanks a lot!
When I initialize my TFT trainer to use multiple GPUs
# Configure network and trainer
pl.seed_everything(407)
trainer = pl.Trainer(
gpus = [0],
gradient_clip_val = 0.1 # hyperparam to prevent gradient divergance for RNNs
)
tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate = 0.03,
hidden_size = 16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size = 1,
dropout = 0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size = 8, # set to <= hidden_size
output_size = 7, # 7 quantiles by default
loss = QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience = 4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
The library is able to recognize that I used both GPUs
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0,1]
Number of parameters in network: 23.4k
However, when I try to find the optimal learning rate
# Find optimal learning rate
res = trainer.lr_find(
tft,
train_dataloader = train_dataloader,
val_dataloaders = val_dataloader,
max_lr = 10.,
min_lr = 1e-6,
)
print(f"Suggested learning rate: {res.suggestion()}")
fig = res.plot(show = True, suggest = True)
fig.show()
I get an AttributeError: Can't pickle local object '_apply_to_outputs.<locals>.decorator_fn.<locals>.new_func'
error with the following trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
3 tft,
4 train_dataloader = train_dataloader,
5 val_dataloaders = val_dataloader,
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
198
199 # Fit, lr & loss logged in callback
--> 200 self.fit(model,
201 train_dataloader=train_dataloader,
202 val_dataloaders=val_dataloaders)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1050 self.accelerator_backend = DDPSpawnBackend(self)
1051 self.accelerator_backend.setup()
-> 1052 self.accelerator_backend.train(model, nprocs=self.num_processes)
1053 results = self.accelerator_backend.teardown(model)
1054
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_spawn_backend.py in train(self, model, nprocs)
41
42 def train(self, model, nprocs):
---> 43 mp.spawn(self.ddp_train, nprocs=nprocs, args=(self.mp_queue, model,))
44
45 def teardown(self, model):
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/torch/multiprocessing/spawn.py in spawn(fn, args, nprocs, join, daemon)
160 daemon=daemon,
161 )
--> 162 process.start()
163 error_queues.append(error_queue)
164 processes.append(process)
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/process.py in start(self)
119 'daemonic processes are not allowed to have children'
120 _cleanup()
--> 121 self._popen = self._Popen(self)
122 self._sentinel = self._popen.sentinel
123 # Avoid a refcycle if the target function holds an indirect
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/context.py in _Popen(process_obj)
282 def _Popen(process_obj):
283 from .popen_spawn_posix import Popen
--> 284 return Popen(process_obj)
285
286 class ForkServerProcess(process.BaseProcess):
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_spawn_posix.py in __init__(self, process_obj)
30 def __init__(self, process_obj):
31 self._fds = []
---> 32 super().__init__(process_obj)
33
34 def duplicate_for_child(self, fd):
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_fork.py in __init__(self, process_obj)
17 self.returncode = None
18 self.finalizer = None
---> 19 self._launch(process_obj)
20
21 def duplicate_for_child(self, fd):
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/popen_spawn_posix.py in _launch(self, process_obj)
45 try:
46 reduction.dump(prep_data, fp)
---> 47 reduction.dump(process_obj, fp)
48 finally:
49 set_spawning_popen(None)
~/anaconda3/envs/forecasting/lib/python3.8/multiprocessing/reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
AttributeError: Can't pickle local object '_apply_to_outputs.<locals>.decorator_fn.<locals>.new_func'
Any idea what may be triggering this? My guess is that because I'm not distributing across multiple machines, the pickle is getting messed up. That's fine and just indicates I misunderstood that setting for distributed_backend
, but moving on, I hit errors with the other distributed_backend
settings as well.
Following https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#distributed-modes, when I hard-code distributed_backend
to ddp2
, I get this trace
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in _resolve_task_idx(self)
52 try:
---> 53 self.task_idx = int(os.environ['LOCAL_RANK'])
54 except Exception as e:
~/anaconda3/envs/forecasting/lib/python3.8/os.py in __getitem__(self, key)
674 # raise KeyError with the original key value
--> 675 raise KeyError(key) from None
676 return self.decodevalue(value)
KeyError: 'LOCAL_RANK'
During handling of the above exception, another exception occurred:
MisconfigurationException Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
3 tft,
4 train_dataloader = train_dataloader,
5 val_dataloaders = val_dataloader,
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
198
199 # Fit, lr & loss logged in callback
--> 200 self.fit(model,
201 train_dataloader=train_dataloader,
202 val_dataloaders=val_dataloaders)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1033 if self.use_ddp2:
1034 self.accelerator_backend = DDP2Backend(self)
-> 1035 self.accelerator_backend.setup()
1036 self.accelerator_backend.train(model)
1037
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in setup(self)
43
44 def setup(self):
---> 45 self._resolve_task_idx()
46
47 def _resolve_task_idx(self):
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp2_backend.py in _resolve_task_idx(self)
54 except Exception as e:
55 m = 'ddp2 only works in SLURM or via torchelastic with the WORLD_SIZE, LOCAL_RANK, GROUP_RANK flags'
---> 56 raise MisconfigurationException(m)
57
58 def train(self, model):
MisconfigurationException: ddp2 only works in SLURM or via torchelastic with the WORLD_SIZE, LOCAL_RANK, GROUP_RANK flags
and when I hard-code distributed_backend
to dp
(which is what I would expect to work most readily), I get
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-29-01060df08a43> in <module>
1 # Find optimal learning rate
----> 2 res = trainer.lr_find(
3 tft,
4 train_dataloader = train_dataloader,
5 val_dataloaders = val_dataloader,
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/lr_finder.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold)
198
199 # Fit, lr & loss logged in callback
--> 200 self.fit(model,
201 train_dataloader=train_dataloader,
202 val_dataloaders=val_dataloaders)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1062 self.accelerator_backend = DataParallelBackend(self)
1063 self.accelerator_backend.setup(model)
-> 1064 results = self.accelerator_backend.train()
1065 self.accelerator_backend.teardown()
1066
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/dp_backend.py in train(self)
95 def train(self):
96 model = self.trainer.model
---> 97 results = self.trainer.run_pretrain_routine(model)
98 return results
99
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
1222
1223 # run a few val batches before training starts
-> 1224 self._run_sanity_check(ref_model, model)
1225
1226 # clear cache before training
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self, ref_model, model)
1255 num_loaders = len(self.val_dataloaders)
1256 max_batches = [self.num_sanity_val_steps] * num_loaders
-> 1257 eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
1258
1259 # allow no returns from eval
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
394 # ---------------------
395 using_eval_result = len(outputs) > 0 and len(outputs[0]) > 0 and isinstance(outputs[0][0], EvalResult)
--> 396 eval_results = self.__run_eval_epoch_end(test_mode, outputs, dataloaders, using_eval_result)
397
398 # log callback metrics
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py in __run_eval_epoch_end(self, test_mode, outputs, dataloaders, using_eval_result)
488 eval_results = self.__gather_epoch_end_eval_results(outputs)
489
--> 490 eval_results = model.validation_epoch_end(eval_results)
491 user_reduced = True
492
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in validation_epoch_end(self, outputs)
142
143 def validation_epoch_end(self, outputs):
--> 144 log, _ = self.epoch_end(outputs, label="val")
145 return log
146
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in epoch_end(self, outputs, label)
611 run at epoch end for training or validation
612 """
--> 613 log, out = super().epoch_end(outputs, label=label)
614 if self.log_interval(label == "train") > 0:
615 self._log_interpretation(out, label=label)
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in epoch_end(self, outputs, label)
245 outputs = [out["callback_metrics"] for out in outputs]
246 # log average loss and metrics
--> 247 n_samples = sum([x["n_samples"] for x in outputs])
248 avg_loss = torch.stack([x[f"{label}_loss"] * x["n_samples"] / n_samples for x in outputs]).sum()
249 log_keys = outputs[0]["log"].keys()
TypeError: unsupported operand type(s) for +: 'int' and 'list'
When I use ddp
(as recommended for pytorch, given the speedup), the pipeline freezes and running watch nvidia-smi
from the terminal just shows the GPUs aren't moving and aren't loading any memory for processing.
This error is thrown using the same setup as I had in #85, which I got working on a single GPU but now that I'm doing multivariate time series across all 50 states I'd really like to use both my GPUs to speed up the runtime.
Thanks!
Hi,
I would like to use the temporal fusion transformer(TFT) for time-series-classification. For our illustrative example we have 2 target classes (True, False) for the cancellation of a membership.
Illustrative example:
Suppose you have an e-commerce shop, and you want to find out whether a customer will cancel his premium membership (like Amazon-prime) based on his shopping behavior.
Parameters for TimeSeriesDataSet:
target = [โcancellationโ]
group_ids= [โcustomerIDโ]
static_categoricals = [โzipโ, โgenderโ]
time_varying_known_reals = [โtime_idxโ]
time_varying_unknown_categoricals = [โshopping_eventโ]
time_varying_unknown_reals = [โage_at_shopping_eventโ]
max_prediction_length=1
Is it convenient to use the TFT for this kind of task?
How does the shape of the target vector look like, e.g.
I have already implemented a BinaryCrossEntropyLoss metric analogous to the CrossEntropyLoss.
If you like, I can make a merge request for the BinaryCrossEntropyLoss.
Thanks for your ideas
First of all, amazing package!
Looks like tons of work.
Thanks so much.
I have spent a few days now getting to know it better, reading the documentation, and also running in debug to better understand what is going on.
I am trying to run N-BEATS on new data, and have a few questions regarding it:
1.a. Once I have a pre-trained model, what is the simplest way to predict on new data?
1.b. What is the minimal length of the data required?
1.c. What would be the behavior if the data is longer? Will the model "cut" just the required last samples and use them to predict?
with the following:
# convert to dataframe, where the various series have different lengths
data = pd.DataFrame()
for k in range(series.shape[0]):
truncate = np.random.randint(0, 20)
if truncate > 0:
truncated_data = series[k, :-truncate]
else:
truncated_data = series[k, :]
new_df = pd.DataFrame({'series': k, 'time_idx': np.arange(len(truncated_data)), 'value': truncated_data})
data = pd.concat([data, new_df], axis=0)
data.reset_index(drop=True, inplace=True)
return data
I executed synthetic_data_tutorial, and got the following exception:
ValueError: Min encoder length and/or min prediction length is too large for 8 series/group
After some digging, I found out that it crashed in:
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training_cutoff+1)
Digging deeper, I found out that when the series (meaning the different group id's) have different lengths, then the longest one will determine the parameters influencing the sequence length, start and end (in the df_index).
Consequently, after the "filter too short sequences" section, some group id's are being filtered out (although they have sufficient length for prediction).
I managed to circumvent this issue after making the following changes:
In the init() of timeseries.py:
I replaced:
if min_prediction_idx is not None:
data = data[lambda x: data[self.time_idx] >= self.min_prediction_idx - self.max_encoder_length] # before my fix, this was the only line in the clause
with:
delta_per_group = data.groupby(self.group_ids)["time_idx"].max().max() - data.groupby('series')["time_idx"].max()
inds_to_keep = np.zeros(shape=(data.shape[0],)).astype(bool)
for k in delta_per_group.index:
inds_to_keep = np.logical_or(inds_to_keep, np.logical_and(data[self.time_idx] >= self.min_prediction_idx - self.max_encoder_length - delta_per_group[k], np.squeeze(data[self.group_ids] == k)))
data = data[inds_to_keep]
and in _construct_index() in the section of "#filter too short sequences":
I replaced:
(x["sequence_length"] + x["time"] >= self.min_prediction_idx + self.min_prediction_length)
with:
(x["sequence_length"] + x["time"] >= self.min_prediction_length + self.min_prediction_idx - (df_index['time_last'].max() - df_index['time_last']))
df_index now looks exactly as I thought it should, and indeed it passed:
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
and also:
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0)
but it crashed in the next line:
Traceback (most recent call last):
File "D:\Users\Lihu\Dropbox\Projects\Maytronics\pytorch_prediction\venv\lib\site-packages\IPython\core\interactiveshell.py", line 2731, in safe_execfile
self.compile if shell_futures else None)
File "D:\pytorch_prediction\venv\lib\site-packages\IPython\utils\py3compat.py", line 168, in execfile
exec(compiler(f.read(), fname, 'exec'), glob, loc)
File "D:\pytorch_prediction\synthetic_data_tutorial.py", line 63, in
actuals = torch.cat([y for x, y in iter(val_dataloader)])
File "D:\pytorch_prediction\synthetic_data_tutorial.py", line 63, in
actuals = torch.cat([y for x, y in iter(val_dataloader)])
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data\dataloader.py", line 363, in next
data = self._next_data()
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data\dataloader.py", line 403, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\pytorch_prediction\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\pytorch_prediction\venv\lib\site-packages\pytorch_forecasting\data\timeseries.py", line 932, in getitem
), "Decoder length should be at least minimum prediction length"
AssertionError: Decoder length should be at least minimum prediction length
I'd definitely appreciate some help from someone who knows the code much better than me :)
Thanks,
Lihu
Hello Jan! I just read the article about this package. I was thrilled to find that it had the temporal fusion transformer, but also that it was working with pytorch-lightning!!. I came straight to the repo to check the code and the implementation, only to find out that you improved a lot upon the code I had built on my initial implementation of the TFT!!. I'm just ecstatic!!. I would love to start contributing to this repo, adding more recipes from simpler pytorch architectures, and also work on the implementation of newer ones!
I am a firm believer that the advances that have happened in Image and NLP, can happen in time series too. We just need a package like this, that helps unify the use for everybody and solves the common issues that happen in the implementations of time series models!
For my part im gonna start testing the models, because im currently working in a time series problem that im trying to solve with neural networks. I was thinking on continue the work i started on the TFT, but i think this implementation is already way better!
Lets keep in touch!
Hi @jdb78,
After getting the TFT model working well for one group of data based on our last convo (and updating to the latest version of the library), I'm getting an odd tensor dimension error when I try to train my model across multiple groups on my data. Specifically on epoch 15 I get this error/trace:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-32-14fda4f79b4a> in <module>
1 # Train model
----> 2 trainer.fit(
3 tft,
4 train_dataloader = train_dataloader,
5 val_dataloaders = val_dataloader
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1071 self.accelerator_backend = GPUBackend(self)
1072 model = self.accelerator_backend.setup(model)
-> 1073 results = self.accelerator_backend.train(model)
1074
1075 elif self.use_tpu:
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_backend.py in train(self, model)
49
50 def train(self, model):
---> 51 results = self.trainer.run_pretrain_routine(model)
52 return results
53
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
1237
1238 # CORE TRAINING LOOP
-> 1239 self.train()
1240
1241 def _run_sanity_check(self, ref_model, model):
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in train(self)
392 # RUN TNG EPOCH
393 # -----------------
--> 394 self.run_training_epoch()
395
396 if self.max_steps and self.max_steps <= self.global_step:
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
548
549 # process epoch outputs
--> 550 self.run_training_epoch_end(epoch_output, checkpoint_accumulator, early_stopping_accumulator, num_optimizers)
551
552 # checkpoint callback
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch_end(self, epoch_output, checkpoint_accumulator, early_stopping_accumulator, num_optimizers)
662 # run training_epoch_end
663 # a list with a result per optimizer index
--> 664 epoch_output = model.training_epoch_end(epoch_output)
665
666 if isinstance(epoch_output, Result):
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/base_model.py in training_epoch_end(self, outputs)
133
134 def training_epoch_end(self, outputs):
--> 135 log, _ = self.epoch_end(outputs, label="train")
136 return log
137
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in epoch_end(self, outputs, label)
613 log, out = super().epoch_end(outputs, label=label)
614 if self.log_interval(label == "train") > 0:
--> 615 self._log_interpretation(out, label=label)
616 return log, out
617
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in _log_interpretation(self, outputs, label)
820 """
821 # extract interpretations
--> 822 interpretation = {
823 name: torch.stack([x["interpretation"][name] for x in outputs]).sum(0)
824 for name in outputs[0]["interpretation"].keys()
~/anaconda3/envs/forecasting/lib/python3.8/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in <dictcomp>(.0)
821 # extract interpretations
822 interpretation = {
--> 823 name: torch.stack([x["interpretation"][name] for x in outputs]).sum(0)
824 for name in outputs[0]["interpretation"].keys()
825 }
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 6 and 7 in dimension 1 at /tmp/pip-req-build-8yht7tdu/aten/src/THC/generic/THCTensorMath.cu:71
At first I thought this could be because not every state has the same amount of data available. For example, for select states, the number of days for which data exist are:
state
CA 218
FL 218
GA 218
NY 218
TX 218
WA 260
So I tried just getting the last 218 observations for each state and resetting time_idx, but the same tensor error was raised.
Right now I'm only including the state as a group variable and am doing univariate time series modeling, so no other data are in the TFT.
Also, the learning rate finder works, this is just the training that's failing. Really odd.
I've made my code available here: https://drive.google.com/file/d/1r1w2tHZJrr8iXVqw7U5_qL1x4iOUsauk/view?usp=sharing. The data I'm playing with are on COVID, and the notebook includes a pd.read_csv() call that read in all the data for you from an online source, so you should be able to run it on your own without any issues.
Any thoughts? Again, I greatly appreciate your feedback as I'm new to deep learning on time series data.
Thanks in advance!
Best,
Alex
First of all, I believe that this is a great initiative, especially that we have the TFT available + the Optuna optimisation! Great!
Amazon's GluonTS has some features automatically addded to the models, most notably, the time features, e.g., it is adding day-of-the-week, month-of-the-year and many more to the model, see here. But it also has some other features available, see here.
So my question is: are you planning to also add these to the Temporal Fusion Transformer? They would be great additions.
I made a model to predict some stock market features but the results seem to be very weird. I used the "Demand forecasting with the Temporal Fusion Transformer" example as a base but my model seems to be off, despite having a larger size, more training, more data etc.
learning_rate=0.013489628825916528, #I got the learning rate by tuner.
hidden_size=64,
attention_head_size=16,
dropout=0.1,
hidden_continuous_size=16,
output_size=7,
loss=QuantileLoss(),
log_interval=10,
reduce_on_plateau_patience=4,
Here are some prediction examples:
Since I'm using colab for training, I did run around 100 epochs, but there were no good. My dataset has 488468 rows, and 6 time_varying_unknown_reals. I really don't understand why at this point the network is misbehaving.
Quick Note:
I also calculated the "Actuals vs predictions by variables" and the results in that part seem to be a little more promising but ended up confusing me even more.
First of all, I believe that this is a great initiative, especially that we have the TFT available + the Optuna optimisation! Great!
Amazon's GluonTS has some features automatically addded to the models, most notably, the time features
But it also has some other features available, see here.
So my question is: are you planning to also add these to the Temporal Fusion Transformer? They would be great additions.
Hi,
Firstly - thank you for your time, work and commitment that went into this package. All is good stuff. Yet one on thing I'm kinda struggling is how to check the predictions on the data that is not visible in the trainer class (from the documentation). I guess I should append it with the original data - but do you have any good practices that you can share ?
Hello, im stallion.py but im getting few problems:
using gpu = 1 in the trainer return
models/temporal_fusion_transformer/init.py", line 793, in _log_interpretation dim=0
using only the cpu it works but return this error:
AttributeError: module 'tensorflow._api.v1.io.gfile' has no attribute 'get_filesystem'
which apparently it is due to pytorch and tensorboard incompatibility but i managed to install pytorch-forecasting without problems so could you tell me what version of pythorch/tensorboard are you using?
thanks
Following up on the discussion from issue #79, there is a need to extend the logging capabilities for keeping track of training figures (e.g. showing attention, forecast quantiles, etc.) for loggers beyond tensorboard (e.g. W&B). The self.logger.experiment.add_figure()
lines that exist in models.base_model
seem to be the route of the issue, as not every logger platform has an add_figure()
method for its experiment
(or Run
in W&B's case) objects.
This is a to-do item for now. Can currently be circumvented (at least in the case of using W&B) by setting log_interval=-1
in the instantiation for TemporalFusionTransformer
object.
I'm trying to create algorithmic trading system, In the very two months I have only 30%-40% win rate, It's not ML system. But I would like to predict possible target for any of found items with PyTorch and TFT model based on previous results.
My normalized data looks like this (I remove some feature column for simplicity, best_target
column is my target).
I got this and I could not find out the reason for the error.
TypeError Traceback (most recent call last)
<ipython-input-10-329c6f2b973b> in <module>()
16 add_target_scales=True,
17 add_encoder_length=True,
---> 18 allow_missings=True
19 )
20
1 frames
/usr/local/lib/python3.6/dist-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
809 f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}",
810 ),
--> 811 UserWarning,
812 )
813 assert len(df_index) > 0, "filters should not remove entries"
TypeError: 'str' object cannot be interpreted as an integer
!pip install pytorch-lightning
!pip install pytorch-forecasting
import warnings
from pathlib import Path
import pandas as pd
import numpy as np
import torch
import copy
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, Baseline
from pytorch_forecasting.data import GroupNormalizer
from pytorch_forecasting.metrics import PoissonLoss, QuantileLoss, SMAPE
from pytorch_forecasting.models.temporal_fusion_transformer.tuning import optimize_hyperparameters
data = pd.read_csv("http://www.sharecsv.com/dl/a08efd0677d449542fd98e2b390101a1/file.csv")
data["directory"] = data.directory.astype('str').astype('category')
data["strategy"] = data.strategy.astype('str').astype('category')
data["order_type"] = data.order_type.astype('str').astype('category')
data["new_york"] = data.new_york.astype('str').astype('category')
data["london"] = data.london.astype('str').astype('category')
data["tokyo"] = data.tokyo.astype('str').astype('category')
data["sydney"] = data.sydney.astype('str').astype('category')
data["wellington"] = data.wellington.astype('str').astype('category')
data["singapore"] = data.singapore.astype('str').astype('category')
data["hong_kong"] = data.hong_kong.astype('str').astype('category')
data["shanghai"] = data.shanghai.astype('str').astype('category')
data["pair"] = data.pair.astype('str').astype('category')
data["currency1"] = data.currency1.astype('str').astype('category')
data["currency2"] = data.currency2.astype('str').astype('category')
data["date"] = pd.to_datetime(data.date, utc=True)
max_prediction_length = 6
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="best_target",
group_ids=["time_idx"],
min_encoder_length=max_encoder_length // 2,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
time_varying_unknown_categoricals=[],
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
allow_missings=True
)
#I got stuck at this point and did not get to the point of using TFT.
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
I have some questions:
time_idx
for every new batch results, Is it right approach?When I try to create a TimeSeriesDataSet for validation it shows me the error in the title. I tried:
validation = TimeSeriesDataSet.from_dataset(training, data, predict=True, stop_randomization=True)
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)
The training dataset is created successfully.
Any ideas what could be wrong?
I ran the code to create a TimeSeriesDataSet and expected the code to create the object in order to move on to the validation split.
max_prediction_length = 6
max_encoder_length = 3914
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx = col_def['time_idx'],
target = col_def['target'], #Value
group_ids = col_def['group_ids'], # the error stems from _construct_index in timeseries.py
min_encoder_length=max_encoder_length // 2, # keep encoder length long (as it is in the validation set)
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_categoricals = col_def['static_categoricals'],
static_reals = col_def['static_reals'],
time_varying_known_categoricals = col_def['time_varying_known_categoricals'],
variable_groups = {}, # group of categorical variables can be treated as one variable
time_varying_known_reals = col_def['time_varying_known_reals'],
time_varying_unknown_categoricals = [],
time_varying_unknown_reals = col_def['time_varying_unknown_reals'],
target_normalizer=GroupNormalizer(
groups = col_def['group_ids'], coerce_positive=1.0
), # use softplus with beta=1.0 and normalize by group
add_relative_time_idx = True,
add_target_scales = True,
add_encoder_length = True,
)
Is there any other code you might need to see to be able to better understand where the issue might stem from?
Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
Stallion Notebook does not finish running. I believe it has to do with having some specific version of tensorflow.
I'm using:
pytorch-lightning==0.9.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow-estimator==2.0.1
tensorflow-gpu==2.3.0
tensorflow-gpu-estimator==2.3.0
Here's the traceback:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-d89dc578d7fe> in <module>
1 # fit network
2 trainer.fit(
----> 3 tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
4 )
~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1082 self.accelerator_backend = CPUBackend(self)
1083 self.accelerator_backend.setup(model)
-> 1084 results = self.accelerator_backend.train(model)
1085
1086 # on fit end callback
~\Anaconda3\lib\site-packages\pytorch_lightning\accelerators\cpu_backend.py in train(self, model)
37
38 def train(self, model):
---> 39 results = self.trainer.run_pretrain_routine(model)
40 return results
~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py in run_pretrain_routine(self, model)
1237
1238 # CORE TRAINING LOOP
-> 1239 self.train()
1240
1241 def _run_sanity_check(self, ref_model, model):
~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\training_loop.py in train(self)
407 if self.should_stop:
408 if (met_min_epochs and met_min_steps):
--> 409 self.run_training_teardown()
410 return
411 else:
~\Anaconda3\lib\site-packages\pytorch_lightning\trainer\training_loop.py in run_training_teardown(self)
1143 # model hooks
1144 if self.is_function_implemented('on_train_end'):
-> 1145 self.get_model().on_train_end()
1146
1147 if self.logger is not None:
~\Documents\GitHub\pytorch-forecasting\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py in on_train_end(self)
584 def on_train_end(self):
585 if self.log_interval(train=True) > 0:
--> 586 self._log_embeddings()
587
588 def step(self, x, y, batch_idx, label="train"):
~\Documents\GitHub\pytorch-forecasting\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py in _log_embeddings(self)
868 labels = self.hparams.embedding_labels[name]
869 self.logger.experiment.add_embedding(
--> 870 emb.weight.data.cpu(), metadata=labels, tag=name, global_step=self.global_step
871 )
~\Anaconda3\lib\site-packages\torch\utils\tensorboard\writer.py in add_embedding(self, mat, metadata, label_img, global_step, tag, metadata_header)
786 save_path = os.path.join(self._get_file_writer().get_logdir(), subdir)
787
--> 788 fs = tf.io.gfile.get_filesystem(save_path)
789 if fs.exists(save_path):
790 if fs.isdir(save_path):
AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'
Hi, can I please ask how you are handling the potential for look-ahead bias in the scaling of the features etc? This seems to be a common problem in timeseries prediction. I did search the docs but couldn't find any such information. Many thanks.
Hi. One of my column has unique values [c0, c1, c2, ..., c15, c16, h1, h2]. These values โโare categorical values. I am trying to predict the rows denoted by h using the rows denoted by c. As h values โโare not included in training, so an error occurs when creating a validation set. I currently have a dataset excluding this column, is there any good way to include this column values?
In https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb cell 9, should learning rate not be set to the result of the study in cell 8 or am I missing something?
# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateLogger() # log the learning rate
logger = TensorBoardLogger("lightning_logs") # logging results to a tensorboard
trainer = pl.Trainer(
max_epochs=30,
gpus=0,
weights_summary="top",
gradient_clip_val=0.1,
early_stop_callback=early_stop_callback,
limit_train_batches=30, # coment in for training, running valiation every 30 batches
# fast_dev_run=True, # comment in to check that networkor dataset has no serious bugs
callbacks=[lr_logger],
logger=logger,
)
tft = TemporalFusionTransformer.from_dataset(
training,
learning_rate=res.suggestion(), #<<<<<<<<<<<<<<<<<<<<<<<<<<<< this
hidden_size=16,
attention_head_size=1,
dropout=0.1,
hidden_continuous_size=8,
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
log_interval=30, # uncomment for learning rate finder and otherwise, e.g. to 10 for logging every 10 batches
reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
Hello @jdb78 ,
If possible could u push me in the right direction for following goal, I could write additional code myself if needed.
The values we want to predict (="target") grouped by a unique combination of "group_id's" I call this a "unique time-series".
My goal is to : Extract predictions, actual values for "target" for each "unique time-series". So I can calculate sMAPE, make plots for each unique time-series.
More in detail below
Currently I tested out TFT on the M5 kaggle dataset: https://www.kaggle.com/c/m5-forecasting-accuracy/data
The google colab notebook is here : https://colab.research.google.com/drive/1OSSf7qgOeyZRSUbGgBektHSPSZ1Yambp?usp=sharing
I used 'shop_id' as single group_id creating time-series for 10 unique groups. I have 19130 samples.
We can make predictions and show them using the plot function (see code below).
This shows some samples of the validation dataset but group_id information can't be found.
My question is: How can val_dataloader be used to get the actual and predicted values of target for each unique time-series ?
# raw predictions are a dictionary from which all kind of information including quantiles can be extracted
raw_predictions, x = best_tft.predict(val_dataloader, mode="raw", return_x=True)
#we can see which information is in these raw predictions
display(raw_predictions.keys())
display(x.keys())
# calculate metric by which to display
predictions = best_tft.predict(val_dataloader)
mean_losses = SMAPE(reduction="none")(predictions, actuals).mean(1)
indices = mean_losses.argsort(descending=False) # sort losses
for idx in range(10): # plot 10 examples
best_tft.plot_prediction(x, raw_predictions, idx=indices[idx], add_loss_to_title=SMAPE())
I have kept some data seperate to not include in the training procedure.
I can extract the real values of the target for each unique time-series.
My Question is: How can I extract predictions on the target for each unique time-series for new data ?
Hi. I thought TimeSeriesDataSet automatically group input data using group_id. In stallion data, there are 351 unique groups and there are 20651 items in train_dataloader (when i set batch size =1). What makes this difference?
And how is the size of the valdiation determined? I checked to_dataloader but couldn't find the answer.
I noticed that the TimeSeriesDataset class is designed to only look at one column for the target variable. In my use case, I'm trying to forecast traffic patterns and, as such, will need to forecast X and Y simultaneously. Given that these are expected to be somewhat correlated (as the presence of a road, for example, isn't equally probable at all latitudes and longitudes), I don't think it would work to build two separate models for forecasting each in isolation. Is there a theoretical limitation for the models currently included in the package that makes it impossible to have more than one target variable?
I updated PyTorch-Forecasting from 0.5.2. to 0.6.0.
Seems like a part of your code in timeseries.py around [804:812] has a mistake.
Here how it looks:
warnings.warn(
(
"Min encoder length and/or min_prediction_idx and/or min prediction length is too large for "
f"{len(missing_groups)} series/groups which therefore are not present in the dataset index. "
"This means no predictions can be made for those series",
f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}",
),
UserWarning,
)
which results in Typeerror: expected string or bytes-like object
You have two redundant commas in the message part which crashes the code. Should look like:
warnings.warn(
(
"Min encoder length and/or min_prediction_idx and/or min prediction length is too large for "
f"{len(missing_groups)} series/groups which therefore are not present in the dataset index. "
"This means no predictions can be made for those series"<NO_COMMA>
f"First 10 removed groups: {list(missing_groups.iloc[:10].to_dict(orient='index').values())}"<NO_COMMA>
),
UserWarning,
)
am I right?
I executed codes to find optimal learning rate or to fit network and and expected to get result as written in pytorch-forecasting.readthedocs.io. The only difference was gpus=1 in pl.Trainer parameter.
# configure network and trainer
pl.seed_everything(42)
trainer = pl.Trainer(
gpus=1,
# clipping gradients is a hyperparameter and important to prevent divergance
# of the gradient for recurrent neural networks
gradient_clip_val=0.1,
)
tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate=0.03,
hidden_size=16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=1,
dropout=0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size=8, # set to <= hidden_size
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
# find optimal learning rate
res = trainer.tuner.lr_find(
tft,
train_dataloader=train_dataloader,
val_dataloaders=val_dataloader,
max_lr=10.0,
min_lr=1e-6,
)
print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()
However, it gives RuntimeError like below:
RuntimeError Traceback (most recent call last)
<ipython-input-11-a92b5627800b> in <module>
5 val_dataloaders=val_dataloader,
6 max_lr=10.0,
----> 7 min_lr=1e-6,
8 )
9
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
128 mode,
129 early_stop_threshold,
--> 130 datamodule,
131 )
132
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in lr_find(trainer, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
173 train_dataloader=train_dataloader,
174 val_dataloaders=val_dataloaders,
--> 175 datamodule=datamodule)
176
177 # Prompt if we stopped early
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
437 self.call_hook('on_fit_start')
438
--> 439 results = self.accelerator_backend.train()
440 self.accelerator_backend.teardown()
441
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in train(self)
52
53 # train or test
---> 54 results = self.train_or_test()
55 return results
56
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
64 results = self.trainer.run_test()
65 else:
---> 66 results = self.trainer.train()
67 return results
68
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
459
460 def train(self):
--> 461 self.run_sanity_check(self.get_model())
462
463 # enable train mode
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
645
646 # run eval step
--> 647 _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
648
649 # allow no returns from eval
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
565
566 # lightning module methods
--> 567 output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
568 output = self.evaluation_loop.evaluation_step_end(output)
569
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, test_mode, batch, batch_idx, dataloader_idx)
169 output = self.trainer.accelerator_backend.test_step(args)
170 else:
--> 171 output = self.trainer.accelerator_backend.validation_step(args)
172
173 # track batch size for weighted average
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in validation_step(self, args)
76 output = self.__validation_step(args)
77 else:
---> 78 output = self.__validation_step(args)
79
80 return output
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in __validation_step(self, args)
84 batch = self.to_device(batch)
85 args[0] = batch
---> 86 output = self.trainer.model.validation_step(*args)
87 return output
88
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
138 def validation_step(self, batch, batch_idx):
139 x, y = batch
--> 140 log, _ = self.step(x, y, batch_idx, label="val") # log loss
141 self.log("val_loss", log["loss"], on_step=False, on_epoch=True, prog_bar=True)
142 return log
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
566 """
567 # extract data and run model
--> 568 log, out = super().step(x, y, batch_idx, label=label)
569 # calculate interpretations etc for latter logging
570 if self.log_interval(label == "train") > 0:
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label)
194 loss = loss * (1 + monotinicity_loss)
195 else:
--> 196 out = self(x)
197 out["prediction"] = self.transform_output(out)
198
~/repo/emart-promo/env/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
~/repo/emart-promo/env/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in forward(self, x)
489 encoder_output, (hidden, cell) = self.lstm_encoder(
490 rnn.pack_padded_sequence(
--> 491 embeddings_varying_encoder, lstm_encoder_lengths, enforce_sorted=False, batch_first=True
492 ),
493 (input_hidden, input_cell),
~/repo/emart-promo/env/lib/python3.7/site-packages/torch/nn/utils/rnn.py in pack_padded_sequence(input, lengths, batch_first, enforce_sorted)
242
243 data, batch_sizes = \
--> 244 _VF._pack_padded_sequence(input, lengths, batch_first)
245 return _packed_sequence_init(data, batch_sizes, sorted_indices, None)
246
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
Seems related to these issues:
I'm seeing issues trying to run the W&B logger when replicating the Towards Data Science example with the Stallion dataset (to be fair, switching to TensorBoard made it fail too, although for a completely different-sounding reason oddly). When I try to train the model, I get a single graphical output (attached) and then it errors out. I'm using:
- pytorch=1.4.0
- pytorch-forecasting=0.4.1
- pytorch-lightning=0.9.0
- wandb=0.10.4
I know the requirements.txt indicates (py)torch >= 1.6, but I can't get conda to find a good solution for that in my dependency tree, and this seems to be a logger issue anyhow. Here's the full traceback:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-562ea3edbba3> in <module>
25 tft,
26 train_dataloader=train_dataloader,
---> 27 val_dataloaders=val_dataloader
28 )
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1082 self.accelerator_backend = CPUBackend(self)
1083 self.accelerator_backend.setup(model)
-> 1084 results = self.accelerator_backend.train(model)
1085
1086 # on fit end callback
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_backend.py in train(self, model)
37
38 def train(self, model):
---> 39 results = self.trainer.run_pretrain_routine(model)
40 return results
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
1222
1223 # run a few val batches before training starts
-> 1224 self._run_sanity_check(ref_model, model)
1225
1226 # clear cache before training
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in _run_sanity_check(self, ref_model, model)
1255 num_loaders = len(self.val_dataloaders)
1256 max_batches = [self.num_sanity_val_steps] * num_loaders
-> 1257 eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
1258
1259 # allow no returns from eval
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
331 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
332 else:
--> 333 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
334
335 is_result_obj = isinstance(output, Result)
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_forward(self, model, batch, batch_idx, dataloader_idx, test_mode)
685 output = model.test_step(*args)
686 else:
--> 687 output = model.validation_step(*args)
688
689 return output
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
138 def validation_step(self, batch, batch_idx):
139 x, y = batch
--> 140 log, _ = self.step(x, y, batch_idx, label="val")
141 return log
142
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
595 """
596 # extract data and run model
--> 597 log, out = super().step(x, y, batch_idx, label=label)
598 # calculate interpretations etc for latter logging
599 if self.log_interval(label == "train") > 0:
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label)
223 log["loss"] = loss
224 if self.log_interval(label == "train") > 0:
--> 225 self._log_prediction(x, out, batch_idx, label=label)
226 return log, out
227
/opt/conda/envs/DIU_NORAD/lib/python3.7/site-packages/pytorch_forecasting/models/base_model.py in _log_prediction(self, x, out, batch_idx, label)
281 else:
282 tag += f" of item {idx} in batch {batch_idx}"
--> 283 self.logger.experiment.add_figure(
284 tag,
285 fig,
AttributeError: 'Run' object has no attribute 'add_figure'
It seems as though pytorch_forecasting
is assuming that all loggers have the same add_figure()
method, but clearly that's not the case in this version of W&B/pytorch-lightning. Any thoughts on way to rectify this? I'd also be game for a workaround to disable the default figure generation during training, although it is very nice to get those super-informative figures so I'd rather get them working if I could!
Hi ! love the look of the package! looks amazing!
I am getting an error for the nbeats example.
load data
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
Number of parameters in network: 1859.9k
num_workers
argument(try 4 which is the number of cpus on this machine) in the
DataLoader` init to improve performance.TypeError Traceback (most recent call last)
in
84 # net.hparams.learning_rate = res.suggestion()
85
---> 86 trainer.fit(
87 net,
88 train_dataloader=train_dataloader,
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
438 self.call_hook('on_fit_start')
439
--> 440 results = self.accelerator_backend.train()
441 self.accelerator_backend.teardown()
442
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py in train(self)
46
47 # train or test
---> 48 results = self.train_or_test()
49 return results
50
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
64 results = self.trainer.run_test()
65 else:
---> 66 results = self.trainer.train()
67 return results
68
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
460
461 def train(self):
--> 462 self.run_sanity_check(self.get_model())
463
464 # enable train mode
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
646
647 # run eval step
--> 648 _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
649
650 # allow no returns from eval
~/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
554 dl_max_batches = self.evaluation_loop.max_batches[dataloader_idx]
555
--> 556 for batch_idx, batch in enumerate(dataloader):
557 if batch is None:
558 continue
~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in next(self)
343
344 def next(self):
--> 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \
~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
854 else:
855 del self._task_info[idx]
--> 856 return self._process_data(data)
857
858 def _try_put_index(self):
~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _process_data(self, data)
879 self._try_put_index()
880 if isinstance(data, ExceptionWrapper):
--> 881 data.reraise()
882 return data
883
~/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/_utils.py in reraise(self)
392 # (https://bugs.python.org/issue2651), so we work around it.
393 msg = KeyErrorMessage(msg)
--> 394 raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/andrewcz/miniconda3/envs/myenv/lib/python3.8/site-packages/pytorch_forecasting/data/timeseries.py", line 968, in getitem
self.target_normalizer.fit(target[:encoder_length])
TypeError: only integer tensors of a single element can be converted to an index
Looks like I've found a bug/unexpected behavior.
I'm making a prediction on a dataset with time-based features marked as 'categorical', namely month alongside day and year.
The start of the dataset is 2020-01-01, and the end is 2020-08-30. The date is parsed into 'year, 'month', and 'day' columns for each row.
Depending on the last dataset record's date(if I cut it for some reason), pytorch-forecasting throws an error that looks like:
Traceback:
File "XXX/venv/lib/python3.8/site-packages/pytorch_forecasting/data/encoders.py", line 105, in
encoded = [self.classes_[v] for v in y]
KeyError: '8'
I've made some experiments/stack traces and this is always the case when you, for instance, have this month(8, August) in the full set but don't have it in your training set - for the reason that your max_prediction_length is bigger than 31 (day) or you have a combination of the last date and max_pred_length like 2020-08-10 and 20, so the last date of training set will be ~2020-07-20 and it won't have '8' month inside.
In this case, going back to the code line provided in traceback, you have this value(8) in np.unique(y) (iterator), BUT in self.classes_ you don't.
Seems like self.classes_ is created based on the training set only, and when you try to invoke TimeSeriesDataSet.from_dataset(trainigset, fullset, .....) you get this error for any additional categorical values that might have appeared in the full dataset.
This logic makes it practically hard to be used on any type of date/time categorically encoded datasets.
Shouldn't any previously unseen categorical value be put into the special 'average' bin and treated as the average of all the known categories? As far as I remember, LightGBM exhibits this behavior for any new categorical values.
I executed notebook in web site https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html.
When fitting network, I get unexpected error.
When I try to bellow code,
# fit network
trainer.fit(
tft,
train_dataloader=train_dataloader,
val_dataloaders=val_dataloader,
)
bellow error messages was displayed.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-13-263e8be26564> in <module>
3 tft,
4 train_dataloader=train_dataloader,
----> 5 val_dataloaders=val_dataloader,
6 )
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
442 self.call_hook('on_fit_start')
443
--> 444 results = self.accelerator_backend.train()
445 self.accelerator_backend.teardown()
446
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in train(self)
61
62 # train or test
---> 63 results = self.train_or_test()
64 return results
65
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
72 results = self.trainer.run_test()
73 else:
---> 74 results = self.trainer.train()
75 return results
76
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
464
465 def train(self):
--> 466 self.run_sanity_check(self.get_model())
467
468 self.checkpoint_connector.has_trained = False
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in run_sanity_check(self, ref_model)
656
657 # run eval step
--> 658 _, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
659
660 # allow no returns from eval
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py in run_evaluation(self, test_mode, max_batches)
576
577 # lightning module methods
--> 578 output = self.evaluation_loop.evaluation_step(test_mode, batch, batch_idx, dataloader_idx)
579 output = self.evaluation_loop.evaluation_step_end(output)
580
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/trainer/evaluation_loop.py in evaluation_step(self, test_mode, batch, batch_idx, dataloader_idx)
169 output = self.trainer.accelerator_backend.test_step(args)
170 else:
--> 171 output = self.trainer.accelerator_backend.validation_step(args)
172
173 # track batch size for weighted average
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in validation_step(self, args)
85 output = self.__validation_step(args)
86 else:
---> 87 output = self.__validation_step(args)
88
89 return output
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py in __validation_step(self, args)
93 batch = self.to_device(batch)
94 args[0] = batch
---> 95 output = self.trainer.model.validation_step(*args)
96 return output
97
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in validation_step(self, batch, batch_idx)
161 def validation_step(self, batch, batch_idx):
162 x, y = batch
--> 163 log, _ = self.step(x, y, batch_idx, label="val") # log loss
164 self.log("val_loss", log["loss"], on_step=False, on_epoch=True, prog_bar=True)
165 return log
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in step(self, x, y, batch_idx, label)
520 """
521 # extract data and run model
--> 522 log, out = super().step(x, y, batch_idx, label=label)
523 # calculate interpretations etc for latter logging
524 if self.log_interval(label == "train") > 0:
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in step(self, x, y, batch_idx, label, **kwargs)
232 self._log_metrics(x, y, out, label=label)
233 if self.log_interval(label == "train") > 0:
--> 234 self._log_prediction(x, out, batch_idx, label=label)
235 log = {"loss": loss, "n_samples": x["decoder_lengths"].size(0)}
236
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in _log_prediction(self, x, out, batch_idx, label)
301 log_indices = [0]
302 for idx in log_indices:
--> 303 fig = self.plot_prediction(x, out, idx=idx, add_loss_to_title=True)
304 tag = f"{label.capitalize()} prediction"
305 if label == "train":
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in plot_prediction(self, x, out, idx, plot_attention, add_loss_to_title, show_future_observed, ax)
669 # plot prediction as normal
670 fig = super().plot_prediction(
--> 671 x, out, idx=idx, add_loss_to_title=add_loss_to_title, show_future_observed=show_future_observed, ax=ax
672 )
673
/opt/conda/lib/python3.6/site-packages/pytorch_forecasting/models/base_model.py in plot_prediction(self, x, out, idx, add_loss_to_title, show_future_observed, ax)
389 for i in range(y_quantiles.shape[1] // 2):
390 if len(x_pred) > 1:
--> 391 ax.fill_between(x_pred, y_quantiles[:, i], y_quantiles[:, -i - 1], alpha=0.15, fc=pred_color)
392 else:
393 quantiles = torch.tensor([[y_quantiles[0, i]], [y_quantiles[0, -i - 1]]])
/opt/conda/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
1808 "the Matplotlib list!)" % (label_namer, func.__name__),
1809 RuntimeWarning, stacklevel=2)
-> 1810 return func(ax, *args, **kwargs)
1811
1812 inner.__doc__ = _add_data_doc(inner.__doc__,
/opt/conda/lib/python3.6/site-packages/matplotlib/axes/_axes.py in fill_between(self, x, y1, y2, where, interpolate, step, **kwargs)
5116 polys.append(X)
5117
-> 5118 collection = mcoll.PolyCollection(polys, **kwargs)
5119
5120 # now update the datalim and autoscale
/opt/conda/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, verts, sizes, closed, **kwargs)
931 %(Collection)s
932 """
--> 933 Collection.__init__(self, **kwargs)
934 self.set_sizes(sizes)
935 self.set_verts(verts, closed)
/opt/conda/lib/python3.6/site-packages/matplotlib/collections.py in __init__(self, edgecolors, facecolors, linewidths, linestyles, capstyle, joinstyle, antialiaseds, offsets, transOffset, norm, cmap, pickradius, hatch, urls, offset_position, zorder, **kwargs)
164
165 self._path_effects = None
--> 166 self.update(kwargs)
167 self._paths = None
168
/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in update(self, props)
914
915 with cbook._setattr_cm(self, eventson=False):
--> 916 ret = [_update_property(self, k, v) for k, v in props.items()]
917
918 if len(ret):
/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in <listcomp>(.0)
914
915 with cbook._setattr_cm(self, eventson=False):
--> 916 ret = [_update_property(self, k, v) for k, v in props.items()]
917
918 if len(ret):
/opt/conda/lib/python3.6/site-packages/matplotlib/artist.py in _update_property(self, k, v)
910 func = getattr(self, 'set_' + k, None)
911 if not callable(func):
--> 912 raise AttributeError('Unknown property %s' % k)
913 return func(v)
914
AttributeError: Unknown property fc
I didn't change the notebook.
Please tell me cause of this error.
Saving latest checkpoint..
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-14-d89dc578d7fe> in <module>
1 # fit network
2 trainer.fit(
----> 3 tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
4 )
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py in wrapped_fn(self, *args, **kwargs)
46 if entering is not None:
47 self.state = entering
---> 48 result = fn(self, *args, **kwargs)
49
50 # The INTERRUPTED state can be set inside the run function. To indicate that run was interrupted
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
1082 self.accelerator_backend = CPUBackend(self)
1083 self.accelerator_backend.setup(model)
-> 1084 results = self.accelerator_backend.train(model)
1085
1086 # on fit end callback
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_backend.py in train(self, model)
37
38 def train(self, model):
---> 39 results = self.trainer.run_pretrain_routine(model)
40 return results
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in run_pretrain_routine(self, model)
1237
1238 # CORE TRAINING LOOP
-> 1239 self.train()
1240
1241 def _run_sanity_check(self, ref_model, model):
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in train(self)
407 if self.should_stop:
408 if (met_min_epochs and met_min_steps):
--> 409 self.run_training_teardown()
410 return
411 else:
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_teardown(self)
1143 # model hooks
1144 if self.is_function_implemented('on_train_end'):
-> 1145 self.get_model().on_train_end()
1146
1147 if self.logger is not None:
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in on_train_end(self)
583 def on_train_end(self):
584 if self.log_interval(train=True) > 0:
--> 585 self._log_embeddings()
586
587 def step(self, x, y, batch_idx, label="train"):
~/anaconda3/envs/TF2/lib/python3.7/site-packages/pytorch_forecasting/models/temporal_fusion_transformer/__init__.py in _log_embeddings(self)
868 labels = self.hparams.embedding_labels[name]
869 self.logger.experiment.add_embedding(
--> 870 emb.weight.data.cpu(), metadata=labels, tag=name, global_step=self.global_step
871 )
~/anaconda3/envs/TF2/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py in add_embedding(self, mat, metadata, label_img, global_step, tag, metadata_header)
786 save_path = os.path.join(self._get_file_writer().get_logdir(), subdir)
787
--> 788 fs = tf.io.gfile.get_filesystem(save_path)
789 if fs.exists(save_path):
790 if fs.isdir(save_path):
AttributeError: module 'tensorflow._api.v2.io.gfile' has no attribute 'get_filesystem'
while following tutorial with my own dataset error occured.
baseline model was ok but TFT not working
pl.seed_everything(42) ##
trainer = pl.Trainer(
gpus=0,
# clipping gradients is a hyperparameter and important to prevent divergance
# of the gradient for recurrent neural networks
gradient_clip_val=0.1,
)
tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate=0.03,
hidden_size=16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=1,
dropout=0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size=8, # set to <= hidden_size
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
KeyError Traceback (most recent call last)
in
24 loss=QuantileLoss(),
25 # reduce learning rate if no improvement in validation loss after x epochs
---> 26 reduce_on_plateau_patience=4, ## patience after which learning rate is reduced by a factor of 10
27 )
28 print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
/kaggle/working/pytorch_forecasting/models/temporal_fusion_transformer/init.py in from_dataset(cls, dataset, allowed_encoder_known_variable_names, **kwargs)
341 # create class and return
342 return super().from_dataset(
--> 343 dataset, allowed_encoder_known_variable_names=allowed_encoder_known_variable_names, **new_kwargs
344 )
345
/kaggle/working/pytorch_forecasting/models/base_model.py in from_dataset(cls, dataset, allowed_encoder_known_variable_names, **kwargs)
887 )
888 new_kwargs.update(kwargs)
--> 889 return super().from_dataset(dataset, **new_kwargs)
890
891 def calculate_prediction_actual_by_variable(
/kaggle/working/pytorch_forecasting/models/base_model.py in from_dataset(cls, dataset, **kwargs)
534 if "output_transformer" not in kwargs:
535 kwargs["output_transformer"] = dataset.target_normalizer
--> 536 net = cls(**kwargs)
537 net.dataset_parameters = dataset.get_parameters()
538 return net
/kaggle/working/pytorch_forecasting/models/temporal_fusion_transformer/init.py in init(self, hidden_size, lstm_layers, dropout, output_size, loss, attention_head_size, max_encoder_length, static_categoricals, static_reals, time_varying_categoricals_encoder, time_varying_categoricals_decoder, categorical_groups, time_varying_reals_encoder, time_varying_reals_decoder, x_reals, x_categoricals, hidden_continuous_size, hidden_continuous_sizes, embedding_sizes, embedding_paddings, embedding_labels, learning_rate, log_interval, log_val_interval, log_gradient_flow, reduce_on_plateau_patience, monotone_constaints, share_single_variable_networks, logging_metrics, **kwargs)
144 embedding_paddings=self.hparams.embedding_paddings,
145 x_categoricals=self.hparams.x_categoricals,
--> 146 max_embedding_size=self.hparams.hidden_size,
147 )
148
/kaggle/working/pytorch_forecasting/models/nn/embeddings.py in init(self, embedding_sizes, categorical_groups, embedding_paddings, x_categoricals, max_embedding_size)
43 self.x_categoricals = x_categoricals
44
---> 45 self.init_embeddings()
46
47 def init_embeddings(self):
/kaggle/working/pytorch_forecasting/models/nn/embeddings.py in init_embeddings(self)
66 self.embedding_sizes[name][0],
67 embedding_size,
---> 68 padding_idx=padding_idx,
69 )
70
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py in setitem(self, key, module)
285
286 def setitem(self, key: str, module: Module) -> None:
--> 287 self.add_module(key, module)
288
289 def delitem(self, key: str) -> None:
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in add_module(self, name, module)
345 raise KeyError("attribute '{}' already exists".format(name))
346 elif '.' in name:
--> 347 raise KeyError("module name can't contain "."")
348 elif name == '':
349 raise KeyError("module name can't be empty string """)
KeyError: 'module name can't contain "."'
The following is taken from Graph Deep Factors for Forecasting:
Deep probabilistic forecasting techniques have recently been proposed for modeling large collections of time-series. However, these techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series in the collection or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, we propose a relational global model that learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, we learn a relational local model that not only considers its individual time-series but also the time-series of nodes that are connected in the graph.
The idea is to have a global-local model that explicitly considers the local pattern of each time series, which is in contrast to purely global models, such as DeepAR, MQRNN, etc.
Hello @jdb78,
Is there the ability to use the forcasting package to predict categorical timeseries? I would like to use the package for a multiclass classification problem. So, in my problem the target parameter in DataSetTimeSeries is categorical and not numeric.
If this is not supported right now can you estimate whether it is possible to integrate this feature and where in the code do you see the biggest adjustments/changes which have to be made?
Hi,
I appreciate your work to provied a package for TFT using pytorch. I just succesfuly run the google AIHUB-implemaentation and would be happy to use a PyTorch version which seems to be somewhat easier.
But, so far I am not able to run even a simple example without problems on Windows 10 without GPU.
It would be great if you could provide an easy starting point into your development. It would help a lot to contribute to your project.
I created an enironment with Python version 3.7.8.
After pip install pytorch-forecasting I got this error:
ERROR: Could not find a version that satisfies the requirement torch<2.0,>=1.6 (from pytorch-forecasting) (from versions: 0.1.2, 0.1.2.post1, 0.1.2.post2) RROR: No matching distribution found for torch<2.0,>=1.6 (from pytorch-forecasting)
Nevertheless, when I install pytorch first (conda install pytorch torchvision cpuonly -c pytorch) then pytorch-forecasting got installed w/o problems and required versions were installed - among other:
optuna 2.0.0 pandas 1.1.0 pytorch-forecasting 0.2.0 pytorch-lightning 0.8.5 pytorch-ranger 0.1.1 scikit-learn 0.23.2 scipy 1.5.2 statsmodels 0.11.1 torch 1.6.0
Your github README.md does not provide a directly executable example (data is missing). Thus I tried ar.py.
First obstacle is that you need generate_ar_data formexample/data/__init__.py
which is not installed via above pip install procedure. A local copy in example_data.py
with
#from data import generate_ar_data
from example_data import generate_ar_data
helped to overcome this.
TimeSeriesDataSet reported an error which led to following correction:
#data["static"] = 2
data["static"] = '2' #must be string
pl.Trainer
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
TemporalFusionTransformer.from_dataset
Number of parameters in network: 76.6k
trainer.fit results in following list:
n | Name | Type | Params |
---|---|---|---|
0 | loss | QuantileLoss | 0 |
1 | input_embeddings | ModuleDict | 1 |
2 | prescalers | ModuleDict | 256 |
3 | static_variable_selection | VariableSelectionNetwork | 9 K |
4 | encoder_variable_selection | VariableSelectionNetwork | 8 K |
5 | decoder_variable_selection | VariableSelectionNetwork | 4 K |
6 | static_context_variable_selection | GatedResidualNetwork | 4 K |
7 | static_context_initial_hidden_lstm | GatedResidualNetwork | 4 K |
8 | static_context_initial_cell_lstm | GatedResidualNetwork | 4 K |
9 | static_context_enrichment | GatedResidualNetwork | 4 K |
10 | lstm_encoder | LSTM | 8 K |
11 | lstm_decoder | LSTM | 8 K |
12 | post_lstm_gate_encoder | GatedLinearUnit | 2 K |
13 | post_lstm_add_norm_encoder | AddNorm | 64 |
14 | static_enrichment | GatedResidualNetwork | 5 K |
15 | multihead_attn | InterpretableMultiHeadAttention | 4 K |
16 | post_attn_gate_norm | GateAddNorm | 2 K |
17 | pos_wise_ff | GatedResidualNetwork | 4 K |
18 | pre_output_gate_norm | GateAddNorm | 2 K |
19 | output_layer | Linear | 99 |
Validation sanity check: 0it [00:00, ?it/s]
and following RuntiemError:
RuntimeError Traceback (most recent call last)
in
12 torch.set_num_threads(10)
13 trainer.fit(
---> 14 tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
15 )
~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\trainer.py in fit(self, model, train_dataloader, val_dataloaders)
1042 self.optimizers, self.lr_schedulers, self.optimizer_frequencies = self.init_optimizers(model)
1043
-> 1044 results = self.run_pretrain_routine(model)
1045
1046 # callbacks
~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\trainer.py in run_pretrain_routine(self, model)
1194 self.val_dataloaders,
1195 max_batches,
-> 1196 False)
1197
1198 # allow no returns from eval
~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\evaluation_loop.py in _evaluate(self, model, dataloaders, max_batches, test_mode)
291 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
292 else:
--> 293 output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
294
295 # on dp / ddp2 might still want to do something with the batch parts
~\miniconda3\envs\tft\lib\site-packages\pytorch_lightning\trainer\evaluation_loop.py in evaluation_forward(self, model, batch, batch_idx, dataloader_idx, test_mode)
468 output = model.test_step(*args)
469 else:
--> 470 output = model.validation_step(*args)
471
472 return output
~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\base_model.py in validation_step(self, batch, batch_idx)
143 def validation_step(self, batch, batch_idx):
144 x, y = batch
--> 145 log, _ = self.step(x, y, batch_idx, label="val")
146 return log
147
~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer_init_.py in step(self, x, y, batch_idx, label)
572 # extract data and run model
573 y = rnn.pack_padded_sequence(y, lengths=x["decoder_lengths"], batch_first=True, enforce_sorted=False)
--> 574 log, out = super().step(x, y, batch_idx, label=label)
575 # calculate interpretations etc for latter logging
576 if self.log_interval(label == "train") > 0:
~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\base_model.py in step(self, x, y, batch_idx, label)
185 loss = self.loss(prediction, y) * (1 + monotinicity_loss)
186 else:
--> 187 out = self(x)
188 out["prediction"] = self.transform_output(out)
189
~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
~\miniconda3\envs\tft\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer_init_.py in forward(self, x)
445 )
446 else:
--> 447 input_vectors[name] = emb(x_cat[..., self.hparams.x_categoricals.index(name)])
448 input_vectors.update({name: x_cont[..., idx].unsqueeze(-1) for idx, name in enumerate(self.hparams.x_reals)})
449
~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
--> 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),
~\miniconda3\envs\tft\lib\site-packages\torch\nn\modules\sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
--> 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) -> str:
~\miniconda3\envs\tft\lib\site-packages\torch\nn\functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
-> 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.IntTensor instead (while checking arguments for embedding)
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
I have been trying to test TFT for an extremely simple toy dataset, but always encounter a ValueError when initialising TimeSeriesDataSet.
I am trying to forecast a simple sine wave (again, this is just to get up and running); my DataFrame has two columns, time_idx
(int 0 to 100), and price
(float -1.0 to 1.0). The code for generating this dataset and initialising my dataset is as follows:
# Simply sample a sin wave
def sample_sin(samples_per_cycle, n_cycles, noise=None):
sampling_gap = 2 * math.pi / samples_per_cycle
xs = [sample * sampling_gap for sample in range(samples_per_cycle * n_cycles)]
ys = [math.sin(x) + ((noise * random.random()) if noise is not None else 0) for x in xs]
return xs, ys
# Save sampled sin wave as csv
def save_sin_dataset(filename, samples_per_cycle, n_cycles, noise=None):
_, ys = sample_sin(samples_per_cycle, n_cycles, noise=noise)
df = DataFrame({'price': ys})
df.index.name = 'time_idx'
df.to_csv(filename)
return range(len(ys)), ys
# Load csv
df = pd.read_csv('sin.csv')
max_encode_length = 36
max_prediction_length = 6
training_cutoff = 90
training = TimeSeriesDataSet(
df[:training_cutoff],
time_idx="time_idx",
group_ids=["price"],
target="price",
min_encoder_length=max_encode_length,
max_encoder_length=max_encode_length,
min_prediction_length=1,
static_categoricals=[],
static_reals=[],
time_varying_known_categoricals=[],
max_prediction_length=max_prediction_length,
time_varying_unknown_reals=[
"price",
],
target_normalizer=EncoderNormalizer(
coerce_positive=1.0
),
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
The error occurs at line 707 in timeseries.py
:
df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1
Traceback:
Traceback (most recent call last):
File "/Users/fraser/Documents/Personal Projects/Kontrary/forecasting.py", line 36, in <module>
add_encoder_length=True,
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 284, in __init__
self.index = self._construct_index(data, predict_mode=predict_mode)
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pytorch_forecasting/data/timeseries.py", line 707, in _construct_index
df_index["count"] = (df_index["time_last"] - df_index["time_first"]).astype(int) + 1
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 5546, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors,)
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 595, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 406, in apply
applied = getattr(b, f)(**kwargs)
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/internals/blocks.py", line 595, in astype
values = astype_nansafe(vals1d, dtype, copy=True)
File "/Users/fraser/Documents/Personal Projects/Kontrary/venv/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 966, in astype_nansafe
raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
ValueError: Cannot convert non-finite values (NA or inf) to integer
When I debug, df_index_first
and df_index_last
contain NaN
values, and I have no clue why; my DataFrame has no gaps or NaN
s.
If someone could let me know what I'm doing wrong that would be great.
I executed code with the intention of creating the TFT model object from a TimeSeriesDataset.
The expected result was a TFT model object that I would proceed to evaluate.
The result was the following: KeyError: 'module name can't contain "."'
I'm not sure what it has to do with, spent a while digging through the PyTorch source code but this issue is rooted deep.
Not really sure what module is being added, where I can see the names of the modules, and why a module name would have a period in it.
# configure network and trainer
pl.seed_everything(42)
trainer = pl.Trainer(
gpus=0,
# clipping gradients is a hyperparameter and important to prevent divergance
# of the gradient for recurrent neural networks
gradient_clip_val=0.1,
)
tft = TemporalFusionTransformer.from_dataset(
training,
# not meaningful for finding the learning rate but otherwise very important
learning_rate=0.03,
hidden_size=16, # most important hyperparameter apart from learning rate
# number of attention heads. Set to up to 4 for large datasets
attention_head_size=1,
dropout=0.1, # between 0.1 and 0.3 are good values
hidden_continuous_size=8, # set to <= hidden_size
output_size=7, # 7 quantiles by default
loss=QuantileLoss(),
# reduce learning rate if no improvement in validation loss after x epochs
reduce_on_plateau_patience=4,
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")
Here is the condensed traceback:
Any help is greatly appreciated.
I have a data set which consists of many multivariate time series (i.e. time series with > 1 value per timestamp originating from many IoT devices).
How can I load such a dataset to pytorch using your https://pytorch-forecasting.readthedocs.io/en/latest/data.html data loader - or do I need to implement my own? I need to ensure that the data is interpreted in the right way to allow the LSTM to learn patterns from an individual time-series / window and include information from multiple devices / time windows in a batch.
I would want to use it for an LSTM-autoencoder to perform anomaly detection.
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'hour': {0: Timestamp('2020-01-01 00:00:00'), 1: Timestamp('2020-01-01 00:00:00'), 2: Timestamp('2020-01-01 00:00:00'), 3: Timestamp('2020-01-01 00:00:00'), 4: Timestamp('2020-01-01 00:00:00'), 5: Timestamp('2020-01-01 01:00:00'), 6: Timestamp('2020-01-01 01:00:00'), 7: Timestamp('2020-01-01 01:00:00'), 8: Timestamp('2020-01-01 01:00:00'), 9: Timestamp('2020-01-01 01:00:00')}, 'metrik_0': {0: 2.020883621337143, 1: 2.808770093182167, 2: 2.5267618429653402, 3: 3.2709845883575346, 4: 3.7984105853602235, 5: 4.0385160093937795, 6: 4.643267594258785, 7: 1.3012379179114388, 8: 3.509304898336378, 9: 2.8664748765561208}, 'metrik_1': {0: 4.580434685779621, 1: 2.933188328317023, 2: 3.999229120882797, 3: 2.9099857745449706, 4: 4.6302055552849, 5: 4.012670194672169, 6: 3.697352153313931, 7: 4.855210603371005, 8: 2.2197913449032254, 9: 2.393605868973481}, 'metrik_2': {0: 3.680527279150989, 1: 2.511065648719921, 2: 3.8350007982479113, 3: 2.4063786290320333, 4: 3.231433617897482, 5: 3.8505378854180115, 6: 5.359150077287063, 7: 2.8966469424805386, 8: 4.554080028058399, 9: 3.3319064764061914}, 'cohort_id': {0: 1, 1: 2, 2: 1, 3: 2, 4: 2, 5: 1, 6: 2, 7: 2, 8: 1, 9: 2}, 'device_id': {0: 1, 1: 3, 2: 4, 3: 2, 4: 5, 5: 4, 6: 3, 7: 2, 8: 1, 9: 5}})
Hi, Really appreciate your work on the TFT.
I am trying to use my own dataset in the code but there seems to be a bug due to which the dataset is not being loaded properly for validation.
The train dataloader is good. but the validation dataloader only has one batch and also validation(TimeSeriesDataSet) has only 1 entry.
Below is my complete code
data = load_csv()
data['date']= pd.to_datetime(data['date'])
data.reset_index(inplace=True, drop=True)
data.reset_index(inplace=True)
data.rename(columns={'index':'time_idx'}, inplace=True) # I use index as time_idx since my data is of minute frequency
validation_len = int(len(data) * 0.1)
training_cutoff = int(len(data)) - validation_len
max_encode_length = 36
max_prediction_length = 6
print('Len of training data is : ',len(data[:training_cutoff]))
print('Len of val data is : ',len(data[training_cutoff:]))
training = TimeSeriesDataSet(
data[:training_cutoff],
time_idx="time_idx",
target="T",
group_ids=["Symbol"],
max_encoder_length=max_encode_length,
max_prediction_length=max_prediction_length,
static_categoricals=["Symbol"],
static_reals=[],
time_varying_known_categoricals=[
"hour_of_day",
"day_of_week",
],
time_varying_known_reals=[
"time_idx",
],
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=["V1", "V2","V3", "T", "V4"],
constant_fill_strategy={"T": 0},
dropout_categoricals=[],
)
print('Max Prediction Index : ',training.index.time.max())
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max()+1)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=1)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=1)
print(len(training), len(validation))
print(len(train_dataloader), len(val_dataloader))`
This is what gets printed by the code :
Len of training data is : 25920
Len of val data is : 2880
Min Prediction Index Value is 25919
25920 1
202 1
You can see that the training dataset is good and the batches are also okay but validation batch is also 1 and dataset length is also 1.
One more thing. if i use predict = False it generates validation data correctly but another bug arises due to that.
if i use predict = True, only 1 batch and 1 sequence is given
if i use predict_mode = true on the training dataset it also generates only 1 batch.
Here is a sample of my CSV
sample_data.csv.zip
Please Help
What would be the best way to get the tft prediction in the same format as my input dataframe?
Instead of just having tensors with predicted values and encoded group_ids I want to have a dataframe with my original group_ids (single value prediction or all quantiles).
I want to go from:
tensor([[0, 0], [0, 1]])
tensor([[234, 375], [73, 70]])
To:
region | sku | time_idx | prediction |
---|---|---|---|
reg1 | sku1 | 100 | 234 |
reg1 | sku1 | 101 | 375 |
reg1 | sku2 | 100 | 73 |
reg1 | sku2 | 101 | 70 |
Thank you for a fantastic library!
Hey,
I'm trying to run N-Beats from the tutorial on M4 dataset.
For example, using the hourly data:
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | ... | V952 | V953 | V954 | V955 | V956 | V957 | V958 | V959 | V960 | V961 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
H1 | 605.0 | 586.0 | 586.0 | 559.0 | 511.0 | 443.0 | 422.0 | 395.0 | 382.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
H2 | 3124.0 | 2990.0 | 2862.0 | 2809.0 | 2544.0 | 2201.0 | 1996.0 | 1861.0 | 1735.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
I've melted the dataset to the TimeSeriesDataSet
format and used the tutorial.
The difference with the M4 data compared to the toy dataset in the tutorial is that the series are of unequal length, therefore there are NAN's at the end. I've used the allow_missings=True
option.
The baseline predictions are filled with NANs, which causes a warning and gives a very high SMAPE loss.
baseline_predictions
>>> tensor([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...
SMAPE()(baseline_predictions, actuals)
>>>UserWarning: Loss is not finite. Resetting it to 1e9
warnings.warn("Loss is not finite. Resetting it to 1e9")
The above is "solved" by dropping NA's from the dataset - the net is training as expected. But this leaves me with very little data.
How can I still work around the NAs?
Hi @jdb78 !! love the library :).
I noticed the time series in the nbeats example are stacked on top of each other.
how do a format a pandas data frame to look the same.
would i need to normalise my data first? -
' ' ' python
import yfinance as yf
data = yf.download("SPY IBM AMZN AAPL", start="2017-01-01", end="2017-04-30")
data['Close']
'''
My data looks something like this. As you can see, there is no data for some dates. So what I did was to find unique dates and sort them. Then I just find the time index based on the date's position in the sorted date list.
sorted_dates = sorted(list(df.index.unique()))
DATE_TO_INDEX = {i: date for date, i in enumerate(sorted_dates)}
df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))
And this is what I got afterward
When I executed code
# categories have to be strings
df['month'] = df.index.month.astype(str).astype('category')
df['weekday'] = df.weekday.astype(str).astype('category')
df['sentiment_binary'] = df.sentiment_binary.astype(str).astype('category')
df['Instrument'] = df.Instrument.astype(str).astype('category')
df['log_close'] = np.log(df.Close + 1e-8)
# Add holidays
us_holidays = holidays.UnitedStates()
df['holiday'] = df.index.map(lambda date: us_holidays[date] if date in us_holidays else '-').astype('category')
#sentiment per instrument for each time index
df['avg_close_by_instrument'] = df.groupby(['time_idx', 'Instrument'], observed=True).Close.transform('mean')
df.reset_index(inplace=True)
df.rename(columns={'index': 'published'}, inplace=True)
train_percentage = 0.1
train_size = int((1 - train_percentage) * len(df))
train = df.iloc[0:train_size]
test = df.iloc[train_size:]
max_prediction_length = test['time_idx'].min()
max_encoder_length = 24
training_cutoff = df["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
df[lambda x: x.time_idx <= training_cutoff],
time_idx='time_idx',
target='Close',
group_ids=['Instrument'],
)
I got this error
------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-216-a26698f913ea> in <module>
7 time_idx='time_idx',
8 target='Close',
----> 9 group_ids=['Instrument'],
10 )
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in __init__(self, data, time_idx, target, group_ids, weight, max_encoder_length, min_encoder_length, min_prediction_idx, min_prediction_length, max_prediction_length, static_categoricals, static_reals, time_varying_known_categoricals, time_varying_known_reals, time_varying_unknown_categoricals, time_varying_unknown_reals, variable_groups, dropout_categoricals, constant_fill_strategy, allow_missings, add_relative_time_idx, add_target_scales, add_encoder_length, target_normalizer, categorical_encoders, scalers, randomize_length, predict_mode)
282
283 # create index
--> 284 self.index = self._construct_index(data, predict_mode=predict_mode)
285
286 # convert to torch tensor for high performance data loading later
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py in _construct_index(self, data, predict_mode)
718 assert (
719 self.allow_missings
--> 720 ), "Time difference between steps has been idenfied as larger than 1 - set allow_missings=True"
721
722 df_index["index_end"], missing_sequences = _find_end_indices(
AssertionError: Time difference between steps has been idenfied as larger than 1 - set allow_missings=True
I have also tried to extract time_idx based on the difference between the date and the min date
def map_date_index(date, min_date):
return (date - min_date).days
min_date = df.index.min()
df['time_idx'] = df.index.map(lambda date: map_date_index(date, min_date))
but I encountered the same error. I wonder what could be the reason for this?
In the stallion notebook, I executed the blockcode
# find optimal learning rate
res = trainer.tuner.lr_find(
tft,
train_dataloader=train_dataloader,
val_dataloaders=val_dataloader,
max_lr=10.0,
min_lr=1e-6,
)
print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()
and I expect to see the learning rate as well as the plot like shown in the notebook.
However, the result was
------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-a92b5627800b> in <module>
5 val_dataloaders=val_dataloader,
6 max_lr=10.0,
----> 7 min_lr=1e-6,
8 )
9
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/tuning.py in lr_find(self, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
128 mode,
129 early_stop_threshold,
--> 130 datamodule,
131 )
132
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in lr_find(trainer, model, train_dataloader, val_dataloaders, min_lr, max_lr, num_training, mode, early_stop_threshold, datamodule)
173 train_dataloader=train_dataloader,
174 val_dataloaders=val_dataloaders,
--> 175 datamodule=datamodule)
176
177 # Prompt if we stopped early
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
438 self.call_hook('on_fit_start')
439
--> 440 results = self.accelerator_backend.train()
441 self.accelerator_backend.teardown()
442
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/accelerators/cpu_accelerator.py in train(self)
46
47 # train or test
---> 48 results = self.train_or_test()
49 return results
50
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py in train_or_test(self)
66 results = self.trainer.run_test()
67 else:
---> 68 results = self.trainer.train()
69 return results
70
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in train(self)
483
484 # run train epoch
--> 485 self.train_loop.run_training_epoch()
486
487 if self.max_steps and self.max_steps <= self.global_step:
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in run_training_epoch(self)
558 # hook
559 # TODO: add outputs to batches
--> 560 self.on_train_batch_end(epoch_output, epoch_end_outputs, batch, batch_idx, dataloader_idx)
561
562 # -----------------------------------------
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py in on_train_batch_end(self, epoch_output, epoch_end_outputs, batch, batch_idx, dataloader_idx)
248 # hook
249 self.trainer.call_hook("on_batch_end")
--> 250 self.trainer.call_hook("on_train_batch_end", epoch_end_outputs, batch, batch_idx, dataloader_idx)
251
252 def reset_train_val_dataloaders(self, model):
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py in call_hook(self, hook_name, *args, **kwargs)
823 if hasattr(self, hook_name):
824 trainer_hook = getattr(self, hook_name)
--> 825 trainer_hook(*args, **kwargs)
826
827 # next call hook in lightningModule
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py in on_train_batch_end(self, outputs, batch, batch_idx, dataloader_idx)
145 """Called when the training batch ends."""
146 for callback in self.callbacks:
--> 147 callback.on_train_batch_end(self, self.get_model(), outputs, batch, batch_idx, dataloader_idx)
148
149 def on_validation_batch_start(self, batch, batch_idx, dataloader_idx):
~/conda/envs/sygnals/lib/python3.7/site-packages/pytorch_lightning/tuner/lr_finder.py in on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx)
399 self.progress_bar.update()
400
--> 401 current_loss = trainer.train_loop.running_loss.last().item()
402 current_step = trainer.global_step + 1 # remove the +1 in 1.0
403
AttributeError: 'NoneType' object has no attribute 'item'
I think it has to do with the version of my pytorch lightning version because the error seems to be in pytorch_lightning. I wonder what version of pytorch_lightning did you use in the notebook?
Hello! First of all, thank you for this wonderful and easy to use library. It's been a joy to work with, and I am really impressed with all of the features you've so kindly added to it for things like easy visualization and evaluation of feature importance!
I am new to deep learning for time series (working on graphs and NLP has been my focus area), and was wondering how I can get the TFT model to generate forecasts ahead to days that are not in the training/validation data? For example, in your example with the stallion data, you produce graphs showing predicted vs observed on the validation set, but how can I get the model to show me predicted trends for data beyond this (e.g., if one wanted to use the model to predict what volume would be in the future)?
Thanks so much for your insight and advice! Really appreciate it!
df_train
is a pandas dataframe with columns 'timestamp', 'user_id', 'content_id', 'content_type_id', 'prior_question_elapsed_time', 'prior_question_had_explanation_False', 'prior_question_had_explanation_True', 'question_cluster', 'answered_correctly'
a minimal version of the code is:
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer
max_prediction_length = 1 # forecast 1 question
max_encoder_length = 5 # use 5 question of history
training = TimeSeriesDataSet(data=df_train,
time_idx="timestamp",
target="answered_correctly",
group_ids=["user_id"],
min_encoder_length=0, # allow predictions without history
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_categoricals=["user_id"],
allow_missings=True,
static_reals=[],
time_varying_known_categoricals=[],
time_varying_known_reals=[],
time_varying_unknown_categoricals=[],
time_varying_unknown_reals=['content_id',
'content_type_id',
'prior_question_elapsed_time',
'prior_question_had_explanation_False',
'prior_question_had_explanation_True',
'question_cluster']
)
print(training.get_parameters())
I would expect to get a dict with the parameters on the std output
However, I get the following:
Traceback (most recent call last):
File "/Users/silvio/Desktop/working/train_TS_nn.py", line 144, in
print(training.get_parameters())
File "/Users/silvio/Desktop/working/venv/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py", line 615, in get_parameters
name: getattr(self, name) for name in inspect.signature(self.class).parameters.keys() if name != "data"
File "/Users/silvio/Desktop/working/venv/lib/python3.7/site-packages/pytorch_forecasting/data/timeseries.py", line 615, in
name: getattr(self, name) for name in inspect.signature(self.class).parameters.keys() if name != "data"
AttributeError: 'TimeSeriesDataSet' object has no attribute 'args'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.