servicenow / n-beats Goto Github PK
View Code? Open in Web Editor NEWN-BEATS is a neural-network based model for univariate timeseries forecasting. N-BEATS is a ServiceNow Research project that was started at Element AI.
License: Other
N-BEATS is a neural-network based model for univariate timeseries forecasting. N-BEATS is a ServiceNow Research project that was started at Element AI.
License: Other
I can successfully make run the experiment on tourism dataset on cpu. However, when I use
make run command=storage/experiments/tourism_interpretable/repeat=0,lookback=2,loss=MAPE/command gpu=0
I have the following feedback:
the input device is not a TTY
Makefile:30: recipe for target 'run' failed
make: *** [run] Error 1
In experiments/m4/generic.gin,
`
instance.history_size = {
'Yearly': 1.5,
'Quarterly': 1.5,
'Monthly': 1.5,
'Weekly': 10,
'Daily': 10,
'Hourly': 10
}
instance.iterations = {
'Yearly': 15000,
'Quarterly': 15000,
'Monthly': 15000,
'Weekly': 5000,
'Daily': 5000,
'Hourly': 5000
}
`
How are the above parameters determined?
I am having trouble reproducing the results on the M4 dataset. I am getting the following error when running the notebook for the m4:
ModuleNotFoundError Traceback (most recent call last)
in
1 import pandas as pd
2
----> 3 from summary.m4 import M4Summary
4 from summary.utils import median_ensemble
5
ModuleNotFoundError: No module named 'summary'
When I follow the reproduction steps I run into this error.
Here is my build script.
#!/bin/bash
make init
make dataset
make build config=experiments/m4/interpretable.gin
make run command=storage/experiments/m4_interpretable/repeat=0,lookback=2,loss=MAPE/command
Hi,
In the 'nbeats.py', the class NBeats:
class NBeats(t.nn.Module):
def __init__(self, blocks: t.nn.ModuleList):
super().__init__()
self.blocks = blocks
def forward(self, x: t.Tensor, input_mask: t.Tensor) -> t.Tensor:
residuals = x.flip(dims=(1,))
input_mask = input_mask.flip(dims=(1,))
forecast = x[:, -1:]
for i, block in enumerate(self.blocks):
backcast, block_forecast = block(residuals)
residuals = (residuals - backcast) * input_mask
forecast = forecast + block_forecast
return forecast
in the forward funtion, why do this operation: ‘residuals = x.flip(dims=(1,))’ ???
Hi,
I propose that the point forecast of N-BEATS be included in this repo for all the dataset evaluated in the original paper like it has been done for the M4 competition github. This would ease the comparison of the N-beats model with others and increase the visibility of the N-BEATS paper.
Beside reproducing the experimental results presented in the paper, it is often more convenient to rely on precomputed forecast to compare different model on the same dataset. For instance, the M4 competition's github repository provides the point forecast of all the models submitted in compressed csv files which permits comparing the models per individual series. We can evaluate the performance of ES-RNN and FFORMA using different loss functions but not the N-beats model. Thankfully, this is the case for ES-RNN and FFORMA given the execution time to reproduce their forecasts. It would be great if N-BEATS wouldn't fall under the reproducibility exception. The argument holds for the other dataset evaluated.
Great repo btw!
class GenericBasis(t.nn.Module):
"""
Generic basis function.
"""
def init(self, backcast_size: int, forecast_size: int):
super().init()
self.backcast_size = backcast_size
self.forecast_size = forecast_size
def forward(self, theta: t.Tensor):
return theta[:, :self.backcast_size], theta[:, -self.forecast_size:]
is it more reasonable to return a function of theta just like trendBasis and seasonalityBasis?
My data set has : time_idx, Date, Ticker, Open, high, low, close, Stock split and Dividends.
My Time series data set is:
training = TimeSeriesDataSet(
combined_data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="Close",
group_ids=["Ticker"],
max_encoder_length=60,
max_prediction_length=7,
static_categoricals=["Ticker"],
time_varying_known_reals=["time_idx", "Open", "High", "Low", "Volume", "Stock Splits", "Dividends"],
time_varying_unknown_reals=["Close"],
allow_missing_timesteps=True,
target_normalizer=GroupNormalizer(
groups=["Ticker"], transformation="softplus"
),
add_relative_time_idx=False,
add_target_scales=True,
add_encoder_length=True,
)
I get this error, when I run this :
model = NBeats.from_dataset(
training,
learning_rate=learning_rate,
hidden_size=hidden_size,
widths=widths,
backcast_loss_ratio=backcast_loss_ratio,
dropout=dropout
)
I see the data set has reals like encoder_length, Close_min, close_scaled and all time varying known reals I gave in the data set.
I tried removing all the reals, still I get the error as number of reals are encoder, and Close related variables. I cannot not give encoder length. Issue arise from this piece of code:
assert (
len(dataset.flat_categoricals) == 0
and len(dataset.reals) == 1
and len(dataset.time_varying_unknown_reals) == 1
and dataset.time_varying_unknown_reals[0] == dataset.target
), "The only variable as input should be the target which is part of time_varying_unknown_reals"
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
In the MASE, I find that the shapes of in_sample
, out_sample
, and forecasting
should be time_o
or time_in
, instead of batch, time_i/o
.
Moreover, does the MASE follow the formulation in Section 2 of the paper? The current implementation seems utilize all historical data (instead of the data from 1~T
), and the denominator in current implementation does not include future data, i.e. data from T~T+H
?
Hi. Does model normalize data and label to range [-1, 1] to train?
Thank you!
The M4 dataset url link in the script is not available. Though I mannually downloaded the datasets from M4Competition , the make run command still fails.
Hi, In N-BEATS, inappropriate dependency versioning constraints can cause risks.
Below are the dependencies and version constraints that the project is using
gin-config
fire
matplotlib
numpy
pandas
patool
torch
tqdm
xlrd
The version constraint == will introduce the risk of dependency conflicts because the scope of dependencies is too strict.
The version constraint No Upper Bound and * will introduce the risk of the missing API Error because the latest version of the dependencies may remove some APIs.
After further analysis, in this project,
The version constraint of dependency pandas can be changed to >=0.13.0,<=0.23.4.
The version constraint of dependency tqdm can be changed to >=4.42.0,<=4.64.0.
The above modification suggestions can reduce the dependency conflicts as much as possible,
and introduce the latest version as much as possible without calling Error in the projects.
The invocation of the current project includes all the following methods.
pandas.DataFrame.to_csv pandas.read_csv pandas.read_excel pandas.concat
itertools.product tqdm.tqdm
dates.s.datetime.strptime.strftime.map.list.np.unique.dump layer collections.OrderedDict numpy.cos pandas.concat numpy.array datasets.tourism.TourismDataset.download i.permutations.np.where.raw_data.rstrip os.stat tqdm.tqdm training_values.extend group_by.forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index.groupby self.build join x_mask.x.model.cpu.detach cmd.write experiments.model.interpretable x_mask.x.model.cpu torch.abs self.snapshot values.extend models.nbeats.GenericBasis min os.fsync numpy.sin optimizer.state_dict numpy.where iter summary.utils.group_values enumerate torch.no_grad __loss_fn URL_TEMPLATE.format test.reset_index.reset_index i.i.data.sum model.to.parameters numpy.mean permutations.rstrip.split.np.array.astype.rstrip os.chmod patoolib.extract_archive torch.device value.str.replace left_indices.append torch.load weighted_score.values torch.save super.__init__ datasets.tourism.TourismDataset.load pandas.DataFrame.to_csv experiments.trainer.trainer snapshot_manager.register float common.sampler.TimeseriesSampler dir_path.Path.mkdir pandas.DataFrame str os.getenv groups.extend torch.mean row_vector.split.np.array.astype i.permutations.np.where.raw_data.rstrip.split x.flip models.nbeats.TrendBasis numpy.random.randint group.lower datasets.traffic.TrafficDataset.download common.metrics.mase shutil.copy torch.cuda.is_available training_loss_fn.backward metric d.items model.load_state_dict datasets.m4.NAIVE2_FORECAST_FILE_PATH.pd.read_csv.values.astype range common.torch.losses.smape_2_loss parsed_values.np.array.astype numpy.array.dump datetime.timedelta i.timedelta.current_date.strftime itertools.product os.path.dirname os.walk time.time torch.nn.ModuleList optimizer.load_state_dict row_vector.split os.rename common.http_utils.download dataset.dump urllib.request.urlretrieve numpy.isnan numpy.load snapshot_manager.restore Exception self.basis_parameters training_loss_fn super url.split numpy.abs snapshot_manager.enable_time_tracking int numpy.power forecasts.extend test_values.extend datasets.m4.M4Dataset.download pandas.read_csv.iterrows common.sampler.TimeseriesSampler.last_insample_window list common.metrics.mape success_flag.Path.touch dict.items pandas.read_csv.set_index cfg.write ids.extend TourismDataset model.to.to datasets.traffic.TrafficDataset.load.split_by_date file_path.os.path.dirname.pathlib.Path.mkdir torch.nn.Linear zip numpy.concatenate models.nbeats.NBeats right_indices.append fire.Fire torch.optim.Adam M3Dataset logging.root.setLevel raw_line.replace.strip.split timeseries_dict.values.list.np.array.dump collections.OrderedDict.values gin.configurable models.nbeats.NBeatsBlock max s.datetime.strptime.strftime permutations.rstrip.split.np.array.astype numpy.append datasets.m3.M3Dataset.load pandas.read_csv len pandas.read_excel default_device TrafficDataset numpy.prod test.iloc.astype dict datasets.electricity.ElectricityDataset.load common.torch.ops.default_device common.torch.ops.divide_no_nan models.nbeats.SeasonalityBasis numpy.zeros datasets.electricity.ElectricityDataset.load.split_by_date torch.load.items torch.nn.Parameter isinstance torch.nn.utils.clip_grad_norm_ numpy.transpose common.torch.snapshots.SnapshotManager sys.stdout.flush torch.load.keys numpy.max os.path.isdir numpy.sum input_mask.flip.flip tempfile.NamedTemporaryFile M4Dataset torch.optim.Adam.zero_grad datasets.m3.M3Dataset.download numpy.round train_meta.iloc.astype numpy.unique torch.tensor dataclasses.dataclass dates.np.array.dump tempfile.NamedTemporaryFile.flush self.instance f.readlines os.path.basename open permutations.rstrip.split common.torch.losses.mape_loss common.metrics.smape_1 forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index raw_line.replace.strip urllib.request.install_opener sys.stdout.write horizons.extend group_by.group_by.forecast_file.summary_filter.experiment_path.os.path.join.glob.tqdm.file.file.pd.read_csv.pd.concat.set_index.groupby.median round group_count logging.info gin.parse_config_file sorted common.torch.losses.mase_loss urllib.request.build_opener torch.relu common.http_utils.url_file_name glob.glob x_mask.x.model.cpu.detach.numpy model.state_dict round_all torch.float32.array.t.tensor.to format datetime.datetime.strptime os.path.join numpy.save datasets.m4.M4Dataset.load self.summarize_groups.keys dates.extend torch.optim.Adam.step self.summarize_groups pathlib.Path torch.einsum instance_path.Path.mkdir self.basis_function numpy.ceil map datasets.traffic.TrafficDataset.load os.path.isfile model.to.train block experiments.model.generic datasets.electricity.ElectricityDataset.download experiments.trainer.trainer.eval model raw_line.replace build_cache splits.items tempfile.NamedTemporaryFile.fileno train.reset_index.reset_index numpy.array.append ElectricityDataset common.metrics.smape_2 numpy.arange next numpy.sqrt
@developer
Could please help me check this issue?
May I pull a request to fix it?
Thank you very much.
For certain datasets, e.g. Yearly/Quarterly/Monthly M4 datasets, the quantity history_size
is set to 1.5, leading window_sampling_limit
to be 1.5 times of the horizon length. Yet, the input size could be up to 7 times the horizon length, meaning that during training phase the model mostly observes padding. Is this an issue, and possibly leading to a degradation in performance in these dataset?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.