Deion When loading the FOOD_3 subset of the M5 competition (

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

adding some thoughts here. After a suggestion by <a class="user-mention notranslate" d

Ok, I have troubles running the notebook in colab. <a class="user-mention notranslate"

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) about gluonts HOT 7 OPEN

timoschowski commented on June 9, 2024

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch)

from gluonts.

Comments (7)

lostella commented on June 9, 2024 2

@timoschowski inspecting the diff, one thing that changed is the dependency on PyTorch Lightning from 1.5 to >= 1.5. It seems like 1.7 introduced the MPS backend https://lightning.ai/pages/community/lightning-releases/pytorch-lightning-1-7-release/ which is one thing that might be causing trouble.

What version of lightning do you use?

Two options to check if this MPS thing is to be blamed:

pin lightning to 1.5 and see if it works better
on whatever version of lightning you have, set trainer_kwargs = dict(accelerator=“cpu”) when constructing the estimator, see if it’s better

I don’t see other changes between the versions that could explain this.

from gluonts.

timoschowski commented on June 9, 2024 1

thanks @lostella, you're a wizard.

I have

import pytorch_lightning as pl
pl.__version__
'1.9.5'

when I do
"accelerator": "cpu"

the resulting output is still this:

however, when running the notebook with
!pip install -U "gluonts[torch]==0.13.0" matplotlib orjson tensorboard optuna datasets "pytorch-lightning==1.5"

results are like this for neg binomial, so indeed improved:

and performance is inline also after more epochs (500 here for v0.13 with lightning 1.5)

compare with (500 here for v0.12 with lightning 1.5)

For the moment I have a workaround by pinning the lightning version, so that's great. Huge thanks.

A couple of interesting things remain:

for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?
One thing that stands out for me is that all the distribution code shifted around, and imports are different. Did we change anything with the neg binomial implementation? Performance with student_t is exactly the same between v0.12 and v0.13 independent of lightning, so I find that curious. It doesn't really show up on the diff, so I'm wondering if you had any intuition here (I remember discussions with @kashif about this in the past)
why doesn't the notebook work on collab? Seems like the model loading doesn't work.

Of course the overall performance isn't there yet (eg peaks aren't aligned), but this is because I don't have any dynamic features included, will bring that back next.

from gluonts.

timoschowski commented on June 9, 2024

@kashif @lostella I mentioned this to you some time ago and @jgasthaus FYI

from gluonts.

timoschowski commented on June 9, 2024

adding some thoughts here. After a suggestion by @kashif I also tried running the notebook with

!pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

which gives me torch version:
'2.3.0.dev20240219'

however results are the same.

from gluonts.

timoschowski commented on June 9, 2024

one thing I noted is that changing context_length from the default prediction_length to 2*prediction_length has a substantial benefit here....

from gluonts.

lostella commented on June 9, 2024

for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?

No, this is an issue; we'll have to figure out what's wrong with recent lightning versions and make sure that everything runs smoothly. Also, given that setting accelerator="cpu" did not work makes me think this may not be a problem on Apple silicon only? Running the same on Linux with recent versions of lightning would answer that

Did we change anything with the neg binomial implementation?

I don't think so: this is the history of changes, and @kashif's change is the only thing that happened. It's #2749 as was part of 0.13.0 already. It really seems like something weird is going on with training.

from gluonts.

timoschowski commented on June 9, 2024

Ok, I have troubles running the notebook in colab. @kashif is this something that you could take a look at? this is about loading the models, something seems to be broken there....

from gluonts.

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) about gluonts HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent