nixtla / neuralforecast Goto Github PK

View Code? Open in Web Editor NEW

2.4K 36.0 282.0 87.78 MB

Scalable and user friendly neural :brain: forecasting algorithms.

Home Page: https://nixtlaverse.nixtla.io/neuralforecast

License: Apache License 2.0

Python 100.00%

deep-learning forecasting esrnn nbeats nbeatsx time-series pytorch transformer nhits neural-network

neuralforecast's Introduction

Nixtla

Forecast using TimeGPT

Nixtla offers a collection of classes and methods to interact with the API of TimeGPT.

🕰️ TimeGPT: Revolutionizing Time-Series Analysis

Developed by Nixtla, TimeGPT is a cutting-edge generative pre-trained transformer model dedicated to prediction tasks. 🚀 By leveraging the most extensive dataset ever – financial, weather, energy, and sales data – TimeGPT brings unparalleled time-series analysis right to your terminal! 👩‍💻👨‍💻

In seconds, TimeGPT can discern complex patterns and predict future data points, transforming the landscape of data science and predictive analytics.

⚙️ Fine-Tuning: For Precision Prediction

In addition to its core capabilities, TimeGPT supports fine-tuning, enhancing its specialization for specific prediction tasks. 🎯 This feature is like training a machine learning model on a targeted data subset to improve its task-specific performance, making TimeGPT an even more versatile tool for your predictive needs.

🔄 `Nixtla`: Your Gateway to TimeGPT

With Nixtla, you can easily interact with TimeGPT through simple API calls, making the power of TimeGPT readily accessible in your projects.

💻 Installation

Get Nixtla up and running with a simple pip command:

pip install nixtla>=0.4.0

🎈 Quick Start

Get started with TimeGPT now:

df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')

from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
    # defaults to os.environ.get("NIXTLA_API_KEY")
    api_key = 'my_api_key_provided_by_nixtla'
)
fcst_df = nixtla_client.forecast(df, h=24, level=[80, 90])

neuralforecast's People

Contributors

Stargazers

Watchers

Forkers

statmixedml uumami wodole dhutexas funson overfittingstudyroom valeman mergenthaler eltociear surajitdb elephann pitmonticone ys610zz fzingler ravigurnatham mpofukelvintafadzwa mhdella yunxileo manikant92 chetanmehra rishirelan benman1 srikanthramappa varunia tradeforfood vfamim forward-wyd djoguns benjamesbabala akousdn donjon86 shuowang-ai creative-research-project-v1-1 gcostaneto yousefazizi1982 fasladodo uberstig gongyibei webclinic017 yangkedc1984 taenikim sampath215 deanofthewebb jeongwhanchoi bragattemas karlheinzniebuhr amirhossini ranganadhkodali mtaufeeq seanahmad euijunglee01 nlebang enertel manu87ds murilocamargos techthiyanes rojinva tanglespace bakhtiaris rickarko osmanatam rohitpandey13 mahii6991 albenoit flyfie troublemaker1994 tenfnan yandeen regulus1492 zhangmazi1 benwaldner srigv timberj99 markhng525 jafarijason mshtelma krzjoa objectin richmojo jfogelberg rxhem immusferr pebbleshx lukasfrankenq javiervicho libiya2000 shoaibkhanz hornacat ajunlonglive oguiza konabuta rmallof theblackcoathunt kamenialexnea tjaplok sinaabbasi hoanglam-novobi outfox330 leexaing hangzhang10

neuralforecast's Issues

Missing import pandas forecast method

Is your feature request related to a problem? Please describe.
The nhits' forecast method needs pandas to run properly. I think this is a silent bug because in the nbs , pandas is imported to test the method but it is not exported.

Describe the solution you'd like
Importing pandas from the beginning should solve the problem.

data/ vs nbs/data/ folders

The notebooks generate heterogeneous testing data folders.
I suggest we unify them into the nbs/data folder by default.

The .gitignore file already contemplates the nbs/data folder.

get_default_mask returns dataframe with different index

The get_default_mask function from the data.tsdataset module returns a dataframe with a different index than the original Y_df. I think the index should be the same.

Anti-nan protection in MASS Loss is masking all-nan forecast tensors

There is a difference in the anti-nan protection for the MASE Loss function used by ElementAI and the one we are using:

ElementAI only uses the divide_no_nan function over the scale factor (https://github.com/ElementAI/N-BEATS/blob/04f56c4ca4c144071b94089f7195b1dd606072b0/common/torch/losses.py#L61):

masep = t.mean(t.abs(insample[:, freq:] - insample[:, :-freq]), dim=1)
masked_masep_inv = divide_no_nan(mask, masep[:, None]) <--- Anti-nan protection only used over the scale factor
return t.mean(t.abs(target - forecast) * masked_masep_inv) <--- No anti-nan protection for the forecast

Our MASELoss function is hiding the nans from the forecast (https://github.com/Nixtla/nixtlats/blob/a3c7442a4c16c255685e158c9347d045f87ffa3b/nixtlats/losses/pytorch.py#L160):

delta_y = t.abs(y - y_hat)
scale = t.mean(t.abs(y_insample[:, seasonality:] -
y_insample[:, :-seasonality]), axis=1)
mase = divide_no_nan(delta_y, scale[:, None]) <--- Anti-nan protection masks nans coming from the scale and the forecast
mase = mase * mask
mase = t.mean(mase)

This difference causes a silent bug by setting the loss to zero during training/validation when the forecasts are meaningless.

Datasets: unrar dependency on M4 dataset

It seems that some Linux versions have problems with the unrar dependency.
A potential solution is to change the link to a previously unzipped dataset.

Datasets: Use sklearn LabelEncoder to declare S matrix

It seems convenient to use sklearn built-in functions to declare the static variables on all the datasets.
Time considerations and code convenience should be taken into account.

Missing imports forecast method

**Is your feature request related to a problem? Please describe.
The nhits' forecast method needs TimeSeriesLoader to run properly. I think this is a silent bug because in the nbs , TimeSeriesLoader is imported to test the method but it is not exported.

**Describe the solution you'd like
Importing needed imports from the beginning should solve the problem.

Related to #134.

Features to speed up training

Recently, we found a couple of tricks to speed up the training processes:

Data Parallelism: torch.nn.DataParallel(module) (https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html). This function automatically splits the input in the forward pass among the available GPUs and automatically collects and adds up the gradients in the backward pass.
Automatic Mixed Precision: torch.cuda.amp (https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html). This package provides functions to automatically cast the datatype of the tensors to optimize the threshold between precision and computing speed.

Datasets: evaluation benchmark example within the dataset

I think it would be extremely handy to have a simple evaluation example for the dataset tasks within the notebook.
Where we import the respective metric and try a simple benchmark, the measurement could be the one at the bottom of the evaluation table in each dataset.

Open AI gym library has a tiny wiki for each RL where they explicitly state what is to "solve" a task.
Example https://github.com/openai/gym/wiki/BipedalWalker-v2.

env.yml incompatibility: ‘HTMLExporter’ object has no attribute ‘template_path’

It seems that fastai has some dependency problems from the nbconvert library from jupyter
downgrading nbconvert solved the issue, seems to me yaml environment needs version protections.
https://forums.fast.ai/t/htmlexporter-object-has-no-attribute-template-path/80775/15

downgrading the nbconvert library solved the issue
conda install nbconvert=5.6.1

M5 Dataset: Missing raw data and references

Missing sample_submission.csv from raw data
Missing reference to original drive link and Kaggle competition.

NBEATS: Zero inflated validation loss with shared Loss class.

The reason why we used to have the numpy validation loss was to deal with the zeros in the mask.
In the numpy loss zeros are not counted as zero they are just excluded, while the pytorch loss just masks the gradients by multiplying the entries.

forecast method exogenous variables

what should be the format of the exogenous variables using the forecast method? does the method use the future?

Datasets: openpyxl >> xlrd xlrd cannot parse GEFCom

To avoid making a very complex environment, we could try to keep only the needed dependencies.
The openpyxl seems more complete to deal with excel sheets we could avoid xlrd.

Datasets: Inherit sklearn.preprocessing scalers?

Should we check the preprocessing scalers to see if we can use the predefined sklearn class and inherit it?

.yml environment channels

Pytorch and pytorch lightning downloads take a lot of time depending on the download channel, and some HTTP issues arise:

Pytorch: CondaHTTPError: HTTP 000 CONNECTION FAILED for URL https://conda.anaconda.org/pytorch/linux-64/pytorch-1.10.2-py3.7_cuda11.3_cudnn8.2.0_0.tar.bz2
Pytorch lightning: socket.timeout: The read operation timed out

In the case of pytorch, specifying the channel (-pytorch::pytorch, -pytorch::torchvision, etc.) solved the problem; with pytorch lightning, setting the timeout to a large number (pip install -U --timeout 2000 pytorch-lightning) did the trick.

NBEATS forward is returning all-nan forecasts

The forward method from the NBEATS model (https://github.com/Nixtla/nixtlats/blob/a3c7442a4c16c255685e158c9347d045f87ffa3b/nixtlats/models/nbeats/nbeats.py#L402) is returning all-nan forecast tensors. The real issue comes from the forecast method (https://github.com/Nixtla/nixtlats/blob/a3c7442a4c16c255685e158c9347d045f87ffa3b/nixtlats/models/nbeats/nbeats.py#L431) when the argument return_decomposition is set to False. (https://github.com/Nixtla/nixtlats/blob/a3c7442a4c16c255685e158c9347d045f87ffa3b/nixtlats/models/nbeats/nbeats.py#L424)

This issue has only been detected when using MASE Loss as the training loss function.

Missing tests for get_filtered_ts_tensor.

The method get_filtered_ts_tensor from the class TimeSeriesDataset is not tested.

Datasets: Use feather for faster load times on M4

Feather is more than 10 times faster at loading than csvs.

forecast method: support for time series with different datestamps

Is your feature request related to a problem? Please describe.
Right now the forecast method assumes:

All series end with the same timestamp.
The datestamp column (ds) of the input dataframe (Y_df) is a well-formatted string.

To train a model these assumptions are not necessary. (A user can train using ints or floats as the ds column).

Describe the solution you'd like
Consider these scenarios in the forecast method.

ESRNN: RNNs predict should filter predictions with sample_mask

ESRNN and RNNs, in general, need a warmup period before sample_mask==1.

The predictions should filter using the sample_mask ex-post. Dataloader should send all the windows so that all sampleable_mask gets predictions.

Add forecast function for transformer-based models

Is your feature request related to a problem? Please describe.
Missing forecast function for transformer-based models.

Describe the solution you'd like
A forecast function for transformer-based models.

Auto correlation test in statistical tests notebook?

Rather than having the autocorrelation test on the Naive2 model, it would be convenient to declare a test folder and specialized notebooks.

TCN model is missing from documentation

Is your feature request related to a problem? Please describe.
The TCN model should be listed under model the docs.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Datasets: Tests for dataframe balance

From the experience of the other ESRNN repo, a test for the dataframe balance can save users a lot of debugging efforts.

ERROR downloading M4 data

The logs printed by M4.load claim an error, but the downloaded data is correct.

Missing assert statement: Nbeats 'n_hidden' argument

When instantiating a Nbeats model, the init() argument 'n_hidden' must be a list of lists, one per 'stack_type,' so each corresponding set of NBeatsBlock is initialized passing a list as the 'theta_n_hidden' argument.

An assert statement in Nbeats.init() would help the user identify if this argument is properly stated, particularly when running grid routines (hyper optimization or ensembling), as the 'n_hidden' item in the grid needs to be defined as a list of lists of lists.

EVALUATE: Weighted metrics have numerical error?

During hyperopt evaluation breaks with:
np.average weights sum to zero can't be normalized

The error does not seem to come from a mask full of zeros.

lecun_normal vs uniform initializer NBEATS

Similar issue as the issue from the original NBEATSx repo:
cchallu/nbeatsx#1
The PyTorch default seems to be uniform and not LeCun normal.

upload demo preview

demo.mp4

Merge nbeats and nhits PyTorch code

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Example notebooks

Is your feature request related to a problem? Please describe.
I see some cool work here, but it's hard to understand how to use the project without some examples.

Describe the solution you'd like
Add some example notebooks showing off how to use the different features in the project.

Describe alternatives you've considered
any other form of documentation

Additional context

Datasets: Homogeneize outputs of the load method

All datasets should give Y_df, X_df, S_df

DataLoader: unstable batch_size

batch_size depends on the windows available at each time series, which makes it unstable:
e.g. if series per batch is 1 and one time series is of lenght 19 or 12 each batch size will have
19 or 12 gradient signals.

Add logSparse as an attention option for Informer

Is your feature request related to a problem? Please describe.
add logSparse attention as an option for the informer model. I've experienced improved results over prob attention. It's a drop-in replacement

Describe the solution you'd like

class LogSparseAttention(nn.Module):
""" https://arxiv.org/abs/1907.00235
"""
    def __init__(self, mask_flag=True, factor=5, scale=None, attention_dropout=0.1, output_attention=False):
        super(LogSparceAttention, self).__init__()
        self.scale = scale
        self.mask_flag = mask_flag
        self.output_attention = output_attention
        self.dropout = nn.Dropout(attention_dropout)

    def log_mask(self, win_len, sub_len):
        mask = torch.zeros((win_len, win_len), dtype=torch.float)
        for i in range(win_len):
            mask[i] = self.row_mask(i, sub_len, win_len)
        return mask.view(1, 1, mask.size(0), mask.size(1))

    def row_mask(self, index, sub_len, win_len):
        log_l = math.ceil(np.log2(sub_len))

        mask = torch.zeros((win_len), dtype=torch.float)
        if((win_len // sub_len) * 2 * (log_l) > index):
            mask[:(index + 1)] = 1
        else:
            while(index >= 0):
                if((index - log_l + 1) < 0):
                    mask[:index] = 1
                    break
                mask[index - log_l + 1:(index + 1)] = 1  # Local attention
                for i in range(0, log_l):
                    new_index = index - log_l + 1 - 2**i
                    if((index - new_index) <= sub_len and new_index >= 0):
                        mask[new_index] = 1
                index -= sub_len
        return mask

    def forward(self, queries, keys, values, attn_mask):
        B, L, H, E = queries.shape
        _, S, _, D = values.shape
        scale = self.scale or 1./sqrt(E)

        scores = torch.einsum("blhe,bshe->bhls", queries, keys)
    
        mask = self.log_mask(L, S)
        mask_tri = mask[:, :, :scores.size(-2), :scores.size(-1)]
        scores = scores.to(queries.device)
        mask_tri = mask_tri.to(queries.device)
        scores = scores * mask_tri + -1e9 * (1 - mask_tri)

        A = self.dropout(torch.softmax(scale * scores, dim=-1))
        V = torch.einsum("bhls,bshd->blhd", A, values)

        if self.output_attention:
            return (V.contiguous(), A)
        else:
            return (V.contiguous(), None)

Describe alternatives you've considered

Additional context
https://arxiv.org/abs/1907.00235
Li, S., Jin, X., Xuan, Y., Zhou, X., Chen, W., Wang, Y. X., & Yan, X. (2019). Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. arXiv preprint arXiv:1907.00235.

DL: unbalanced panel and available_mask interactions

Partial work was done on unbalanced panels and protections of the dataloader with the available_mask
It would be beneficial to finish the experiment utility to balance the panel and make the needed tests of the available_mask use within the dataloader.

Installation issues

Due PyTorch being a requirement, I had to downgrade out of python3.10 somehow, so I tried with conda

I tried this to no avail:

conda create -n nixtla python=3.7  # And also 3.6
conda activate nixtla
conda install -c nixtla neuralforecast

getting library errors such as:

This can take several minutes.  Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

I finally was able to install it after using python=3.7.11 as shown in the documentation (not generic 3.7 as shown above)

GPU usage in inference/forecast phase

Is your feature request related to a problem? Please describe.
It would be a great feature to have gpu usage in inference/forecast phase to reduce processing time in production

Describe the solution you'd like
I think a possible solution could be:

@patch
def forecast(self: model, Y_df, X_df, S_df, batch_size, **trainer_kwargs):
       ...
       trainer = pl.Trainer(**trainer_kwargs)
       ....

So the user could simply use:

model.forecast(Y_df, X_df, S_df, batch_size, gpus=4)

Problem with downloading gefcom datasets on Windows

When running data_datasets__gefcom2012.ipynb and data_datasets__gefcom2014.ipynb following error appear.
When downloading gefcom2012 dataset this error occured.

When downloading gefcom2014 dataset this error occured.

I'm using python in anaconda enviroment on Windows.

We could rather define new window_size to guarantee that the rolled windows will be of window_size
output_size will be only be needed for protection purposes and sampleable_conditions

The only thing that len_sample_chunk is doing is padding in a different way, (0s or input + output).

nixtla / neuralforecast Goto Github PK

neuralforecast's Introduction

Nixtla

Nixtla

Forecast using TimeGPT

🕰️ TimeGPT: Revolutionizing Time-Series Analysis

⚙️ Fine-Tuning: For Precision Prediction

🔄 Nixtla: Your Gateway to TimeGPT

💻 Installation

🎈 Quick Start

neuralforecast's People

Contributors

Stargazers

Watchers

Forkers

neuralforecast's Issues

Recommend Projects

Recommend Topics

Recommend Org

🔄 `Nixtla`: Your Gateway to TimeGPT