torchspatiotemporal / tsl Goto Github PK
View Code? Open in Web Editor NEWtsl: a PyTorch library for processing spatiotemporal data.
Home Page: https://torch-spatiotemporal.readthedocs.io/
License: MIT License
tsl: a PyTorch library for processing spatiotemporal data.
Home Page: https://torch-spatiotemporal.readthedocs.io/
License: MIT License
Hi,
I'm very interested in the library you've created (kudos on your documentation!), but am working on a platform (Dataiku) that won't let me install any pandas version (restriction is pandas>=1.3,<1.4
).
Therefore I can't test your library.
Is there an important incompatibility with earlier versions of pandas that forced you to specifically noted pandas>=1.4
in the requirements?
Hello,
I try to reconstruct the final time series after imputation as it is done in figure 3 of https://arxiv.org/pdf/2108.00298.pdf.
Do you have an utility function or method in tsl package for this?
Cheers,
Guillaume
Hi creators of TSL,
I would like to know the sensor locations of the pems08 dataset, but those are not available.
However, I think if I know which sensors were used, I can find them in the 'master' dataset: https://www.kaggle.com/datasets/liuxu77/largest
Do you have the names of the sensors available somewhere?
Kind regards,
Stefan
tsl/tsl/metrics/torch/pinball_loss.py
Line 30 in 9a5cee9
In pinball_loss the PyTorch Lightning Metric deprecated parameter compute_on_step
is used.
See Lightning-AI/torchmetrics#789
It should be removed accordingly.
In general, it may make sense to update compatibility to Lightning 2.0.
This issue serves as a reminder for discussion on such thing.
Line 31 in cd37c5a
Tested with the gatedgn model
As far as I understand reading Table 5 of Appendix B from the original GRIN article, this dataset has the structure of an undirected graph with 2699 edges.
However, I have executed the following code in a Jupyter Notebook:
from tsl.datasets import AirQuality
x = AirQuality()
index, weight = x.get_connectivity()
print(index.shape, weight.shape)
And I saw the this output:
(2, 66661) (66661,)
I don't understand well how this shape is possible in the edge index. First I thought that maybe you have already provided an implementation of this dataset as a directed graph, but shouldn't the shape be as much as 2*undirected_edges
?
Hello!
Could you please clarify what is the reason for transposing adjacency matrix in get_connectivity
method with layout="edge_index"
option?
For directed graphs, this method returns reflected edges:
import tsl
import torch
import numpy as np
import pandas as pd
from tsl.datasets import LargeST
dataset = LargeST(root='./data/')
connectivity_dense = dataset.get_connectivity(layout="dense")
edges, weights = dataset.get_connectivity()
connectivity_sparse = np.zeros_like(connectivity_dense)
connectivity_sparse[edges[0], edges[1]] = weights
print(np.allclose(connectivity_sparse, connectivity_dense)) # returns False
print(np.allclose(connectivity_sparse, connectivity_dense.T)) # returns True
Also, connectivity_dense
is equal to adjacency from the original work. This behaviour of layout="edge_index"
seems misleading for me.
Looking forward to your answer, thank you!
Hello,
Is it possible to change the number of node after training.
For example I have trained an Imputer with 180 channel and 4 nodes.
Can I perform inference with data that have 180 channel and 5 nodes?
Hi Ivan,
Sorry to bother you. I am confused with the training_mask
and eval_mask
. May I understand that the training_mask
is the mask of input which represents the missing value in input and eval_mask
is the mask of target, which stands for the visible ground truth. If I want to conduct forecasting task, is it suitable to only change the parameters horizon
and delay
in Imputation_dataset.py
? Or could you please give some advices for how to build forecasting dataset? Thank you for your help!
Hi,
Thanks for your work on this package, this really helps out at lot!
I am trying to run the examples/forecasting/run_traffic_experiment.py file, but i get the following error:
C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\pytorch_lightning\utilities\seed.py:55: UserWarning: No seed found, seed set to 1866288922
rank_zero_warn(f"No seed found, seed set to {seed}")
Global seed set to 1866288922
[2022-12-02 12:10:51,148][tsl][INFO] -
**** Experiment config ****
run:
seed: 1866288922
dir: C:\....\experiments\tsl\examples\forecasting\outputs\2022-12-02\12-10-51
name: 2022-12-02_12-10-51_1866288922
Error executing job with overrides: []
Traceback (most recent call last):
File "C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\tsl\experiment\experiment.py", line 156, in decorated_run_fn
self.run_output = func(cfg)
File "run_traffic_experiment.py", line 73, in run_traffic
dataset = get_dataset(cfg.dataset.name)
omegaconf.errors.ConfigAttributeError: Key 'dataset' is not in struct
full_key: dataset
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
So I thought maybe i have to change line 215 from:
exp = Experiment(run_fn=run_traffic, config_path='config/traffic')
to
exp = Experiment(run_fn=run_traffic, config_path='config/traffic/stcn.yaml')
but that gave the following error:
(tsl) sdblo@DLMACHINE C:\Users\sdblo\...\experiments\tsl\examples\forecasting>python run_traffic_experiment.py
Traceback (most recent call last):
File "run_traffic_experiment.py", line 216, in <module>
res = exp.run()
File "C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\tsl\experiment\experiment.py", line 189, in run
self.run_fn()
File "C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\hydra\main.py", line 90, in decorated_main
_run_hydra(
File "C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\hydra\_internal\utils.py", line 330, in _run_hydra
validate_config_path(config_path)
File "C:\Users\sdblo\miniconda3\envs\tsl\lib\site-packages\hydra\core\utils.py", line 293, in validate_config_path
raise ValueError(msg)
ValueError: Using config_path to specify the config name is not supported, specify the config name via config_name.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/config_path_changes
So it seems there is some information missing in the yaml files. How could this be fixed?
Kind regards
Issue:
The convenience method SpatioTemporalDataset.from_dataset does not accept any transform parameter, so it either has to be set manually after instantiation or the user must use the standard constructor.
Proposed solution:
Add transform as parameter to from_dataset or move to a kwargs based implementation.
I can implement this based on an agreed upon solution.
Hi TSL team,
I have been playing around with your package a lot lately, and it works great!
However, I do experience issues when trying to alter your code to my specific needs.
For example, I would like to run a loop through several different settings (e.g., with a .yaml file).
However, after the first loop is done, and a new iteration is started, I get the following error:
Could not override 'dataset.name'.
To append to your config use +dataset.name=bay
Key 'dataset' is not in struct
full_key: dataset
object_type=dict
I think it has something to do with the get_hydra_cli_arg function
def get_hydra_cli_arg(key: str, delete: bool = False):
try:
key_idx = [arg.split("=")[0] for arg in sys.argv].index(key)
arg = sys.argv[key_idx].split("=")[1]
if delete:
del sys.argv[key_idx]
return arg
except ValueError:
return None
which seems to remove some of the sys.argv arguments.
My question therefore is, could I prevent this behavior easily? If not, could I then circumvent using Hydra entirely?
I have seen the "A Gentle Introduction to tsl" Jupyter Notebook with a different way of running it, but there I do not get all the important settings of the experiment anymore, and I want to be sure that the settings are correct.
In an ideal scenario, it would be possible to just have a script that runs from line 0 to n in sequential order (such as in the A gentle introduction to tsl notebook) but with the information of the run_traffic_experiment.py and their configs. This information is there in the notebook for the timethenspace model:
model_kwargs = {
'input_size': dm.n_channels, # 1 channel
'horizon': dm.horizon, # 12, the number of steps ahead to forecast
'hidden_size': 16,
'rnn_layers': 1,
'gcn_layers': 2
}
But not for the other models, right?
I hope I explained my problems clearly, otherwise please tell me is something is not clear! If you could guide me in the right direction that would be really great!
Thanks in advance!
The issue:
The current ScalerModule implementation inherits the bias and scale parameter from the given Scaler and then uses its own transform implementation. This is not transparent to the user, as, even if one overrides the base implementation of Scaler.transform, the ScalerModule will resort to its own way of scaling the input.
This may lead to unexpected results; in particular considering that the SpatioTemporalDataset wraps every given Scaler into a ScalerModule.
Proposed Solution:
The ScalerModule should inherit also the way in which the original Scaler implements the transform.
Hi, it is my understanding tsl has no support for heterogenous graphs. Is that correct?
Hello,
I try to run a variant of run_imputation.py
with multiple GPU but I got the following error using dp strategy:
pytorch_lightning.utilities.exceptions.MisconfigurationException: Overriding `on_after_batch_transfer` is not supported in DP mode.
Please do you known if it is possible to fix this or if it is possible to take advantage of multiple GPUs for imputation?
Missing "keys" method call. Easily fixed by adding parentheses.
The error message:
FAILED test_example_imputation.py::test_example_imputation - TypeError: on_train_batch_start() takes 3 positional arguments but 4 were given
Can you please fix the error.
Many thanks
Hello everyone,
I am currently using TorchSpatiotemporal to conduct experiments for my Master's thesis in Data Science and Engineering under the supervision of Professor Paolo Garza.
The dataset I am working with is the SDPWF dataset, which was the main subject of the Baidu KDD competition in 2022. This dataset comprises data from over 100 sensors (wind turbines), recording approximately 10 different channels every ten minutes for 245 days. My task involves performing forecasting on this data. The objective is to compare various spatial-temporal deep learning architectures to understand how incorporating spatial information can improve prediction accuracy.
I have set up the necessary features and initialized the SpatioTemporalDataset and SpatioTemporalDataModule classes. Additionally, I have configured the Predictor and Trainer environment (see my Colab notebook here). I successfully trained a GraphWaveNetModel on this data by creating an SDPWFDataset class extending DatetimeDataset. The input dataframes are formatted correctly, with a datetime Pandas index representing the temporal dimension and a multi-column index mapping each wind turbine to its recorded channels. I also generate a dataset mask, a boolean dataframe indicating data availability for specific timeslots and wind turbines.
I am seeking clarification on the dataset mask, as I couldn't find much information in the documentation or GitHub repository. My specific questions are:
I have summarized the issue here, but please feel free to ask for additional details if needed. Feel also free to correct any misunderstanding here. Thank you for your support!
Best regards,
Hello,
Thanks for your great contribution in neural spatiotemporal data processing community!
I found that some works used the old version of tsl package, i.e. v-0.1.1, and there are some difference on args compared with the latest version. For example, in tsl-0.1.1 there are training_mask
and eval_mask
args in class ImputationDataset
, but in the latest version training_mask
is discarded, so could you please tell me is there something changes or some mechanism allow us can discard training_mask
?
Looking forward to your reply, thanks!
Hello TSL team,
Is it possible to add the functionality of specifying the save directory of a dataset?
Such as:
dataset = MetrLA('current_path or folder')
because I am trying to run TSL on an external GPU cluster where I do not have root access/all permissions. Therefore, I think I am getting the following error when trying to run the MetrLA() command:
import tsl
import torch
import numpy as np
np.set_printoptions(suppress=True)
print(f"tsl version : {tsl.__version__}")
print(f"torch version: {torch.__version__}")
from tsl.datasets import MetrLA
dataset = MetrLA(root='data')
gives the error:
tsl version : 0.9.0
torch version: 1.10.1+cu111
Segmentation fault (core dumped)
which seems to be: "Segmentation fault" means that you tried to access memory that you do not have access to".
Thanks in advance!
Attempted to install following directions on the quickstart page. libmamba
reported it was unable to resolve the environment due to pytorch-cuda>=11.7
. Relaxed restriction to pytorch-cuda>=11.6
and was able to install the environment and execute the gentle introduction
notebook (I did have to pip install tensorboard
as well).
I'm not sure if this is particular to my machine (Windows 10, Version 22H2), processor (NVIDIA RTX A2000) and driver (537.58) or not, but the modified env spec is:
name: tsl
channels:
- pytorch
- pyg
- nvidia # remove for cpu installation
- conda-forge
- defaults
dependencies:
- python=3.10
- pytorch
- pytorch-cuda>=11.6 # remove for cpu installation
- pyg
- pytorch-scatter
- pytorch-sparse
- lightning
- pip
- pip:
- einops
- hydra-core
- numpy>1.20.3
- omegaconf
- pandas>=1.4
- PyYAML
- scikit-learn
- scipy
- tables
- tensorboard
- torchmetrics>=0.7
- tqdm
Hi,
I just installed tsl
, having been really interested in your code and documentation.
However, I realize that I don't know where to start to create my own dataset. The only examle codes I see are using samples already included inyour library.
If I have these three pandas DataFrame (details of their columns below), how would I go creating my SpatioTemporalDataset
object?
Hello,
I am facing some challenges in predicting time-series classification for stock data.
Some are resolvable, though I'm not sure is there any better solution.
After thinking of these problem, I'm confused about how to get started..
Thank you for this amazing resource! Like others have raised in other issues, it seems:
tsl
In addition to those two points, I've also noticed that:
As a complete outsider to GNNs, I am wondering if I could get the authors' help in getting feedback on creating an example for beginners. In this way, I am hoping to contribute to the documentation, such that even a complete novice (such as I) can get started using tsl
.
For instance, I have been thinking - say there is dataset of car trajectories, collected over time. How can we go from a dataframe (shown below), to training a model in tsl
to predict the missing positions x
, y
, z
?
import numpy as np
import pandas as pd
# Define number of trajectories and time points
num_traj = 5
num_timepoints = 10
# Generate random trajectories
data = pd.DataFrame(np.random.randn(num_traj*num_timepoints, 4), columns=['x', 'y', 'z', 't'])
# Assign trajectory ID for each time point
data['trajectory'] = np.repeat(np.arange(num_traj), num_timepoints)
# Set some values to NaN to represent missing positions
data.iloc[np.random.choice(data.index, size=10, replace=False), :3] = np.nan
# Set timepoints to positive integers and the same for all instances of each trajectory
for traj_id in range(num_traj):
traj_data = data.loc[data['trajectory'] == traj_id]
traj_data['t'] = np.arange(num_timepoints)
data.loc[data['trajectory'] == traj_id] = traj_data
Hi everyone :)
I am having difficulty understanding whether it is possible and if so how to include future covariates, such as weather forecasts.
I would need to include this information in the training, referring to the time horizon of the target
What I am doing is something like this:
`
dataset = AirQuality(small=True, impute_nans=True)
df = dataset.dataframe()
fake_past_cov_df = pd.DataFrame(np.random.randn(*df.shape), columns=df.columns, index=df.index)
fake_fut_cov_df = pd.DataFrame(np.random.randn(*df.shape), columns=df.columns, index=df.index)
day_sin_cos = dataset.datetime_encoded('day').values
weekdays = dataset.datetime_onehot('weekday').values
torch_dataset = SpatioTemporalDataset(target=dataset.dataframe(),
mask=dataset.mask,
horizon=12,
window=12,
stride=1,
precision=16,
name='AQI')
torch_dataset.add_covariate(name='fake_past_cov', value=fake_past_cov_df, add_to_input_map=True, synch_mode=SynchMode.WINDOW, pattern='tnf')
torch_dataset.add_covariate(name='fake_fut_cov', value=fake_fut_cov_df, add_to_input_map=False, synch_mode=SynchMode.HORIZON, pattern='tnf')
torch_dataset.add_exogenous(name='global_u', value=np.concatenate([day_sin_cos, weekdays], axis=-1), add_to_input_map=True, synch_mode=SynchMode.WINDOW)
`
But when training for example an AGCRNModel
, i receive this:
Sanity Checking DataLoader 0: 0%| | 0/2 [00:00<?, ?it/s]Arguments ['fake_past_cov'] are filtered out. Only args ['u', 'x'] are forwarded to the model (AGCRNModel).
Anyone knows how to overcome this problem, for both past and future covariates ?
Unfortunately, I couldn't find anything about it in the documentation.
Any help is appreciated ;) Thanks
Hello,
Is there a dataset with multiple channels available in tsl in the pre-configured one ?
In tsl.metric.torch.metric_base.MaskedMetric.__init__
we have the following snippet
if metric_fn is None:
self.metric_fn = None
else:
self.metric_fn = partial(metric_fn, **metric_fn_kwargs)
Wouldn't this lead to a behavior in which we have an error thrown for passing metric_fn=None
only upon computing the metric instead that having it thrown upon class construction?
Is this a desired behavior?
tsl/tsl/metrics/torch/metric_base.py
Lines 112 to 122 in 154cccf
In the referenced snippet, MaskedMetric's update function assumes the time dimension is the second.
However, this leads to adding an unnecessary dummy batch dimension when we represent a batch of graphs as a single big graph.
I suggest two possible solutions to avoid this:
t_dim
parameter and use x.select(t_dim, self.at)
instead of x[:, self.at]
.In both cases this can either be a class attribute or a method parameter, it depends on preferring to have it set once or allowing it to be changed each time the metric is updated.
The first solution is the easiest to implement, however the second one may make further dimension semantics dependent aggregations easier to implement down the road.
If this is deemed useful I can implement this behavior with an agreed solution.
More specifically this error: ModuleNotFoundError: No module named 'tsl.ops.connectivity'
I believe that file was added about a week ago, and the tsl library might have not been redeployed on pip
since then. Is it possible to make a new release on pip
?
I'm having some difficulty setting it up directly from the git repo; perhaps the pip
version would help.
Thank you!
72 out = out.stores_as(elem)
74 pattern = elem.pattern
---> 76 for key in elem.keys:
77 if key == 'transform':
78 out[key] = static_scaler_collate([data[key] for data in data_list])
Putting () after elem.keys fixes the problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.