Giter Club home page Giter Club logo

Comments (7)

lostella avatar lostella commented on June 9, 2024 2

@timoschowski inspecting the diff, one thing that changed is the dependency on PyTorch Lightning from 1.5 to >= 1.5. It seems like 1.7 introduced the MPS backend https://lightning.ai/pages/community/lightning-releases/pytorch-lightning-1-7-release/ which is one thing that might be causing trouble.

What version of lightning do you use?

Two options to check if this MPS thing is to be blamed:

  1. pin lightning to 1.5 and see if it works better
  2. on whatever version of lightning you have, set trainer_kwargs = dict(accelerator=“cpu”) when constructing the estimator, see if it’s better

I don’t see other changes between the versions that could explain this.

from gluonts.

timoschowski avatar timoschowski commented on June 9, 2024 1

thanks @lostella, you're a wizard.

I have

import pytorch_lightning as pl
pl.__version__
'1.9.5'

when I do
"accelerator": "cpu"

the resulting output is still this:
M5_FOOD_3_v0 12_cpu_accelerator

however, when running the notebook with
!pip install -U "gluonts[torch]==0.13.0" matplotlib orjson tensorboard optuna datasets "pytorch-lightning==1.5"

results are like this for neg binomial, so indeed improved:
M5_FOOD_3_v0 13_lighting1 5_15epochs

and performance is inline also after more epochs (500 here for v0.13 with lightning 1.5)
M5_FOOD_3_v0 13_lighting1 5_500epochs

compare with (500 here for v0.12 with lightning 1.5)
M5_FOOD_3_v0 12_lighting1 5_500epochs

For the moment I have a workaround by pinning the lightning version, so that's great. Huge thanks.

A couple of interesting things remain:

  • for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?
  • One thing that stands out for me is that all the distribution code shifted around, and imports are different. Did we change anything with the neg binomial implementation? Performance with student_t is exactly the same between v0.12 and v0.13 independent of lightning, so I find that curious. It doesn't really show up on the diff, so I'm wondering if you had any intuition here (I remember discussions with @kashif about this in the past)
  • why doesn't the notebook work on collab? Seems like the model loading doesn't work.

Of course the overall performance isn't there yet (eg peaks aren't aligned), but this is because I don't have any dynamic features included, will bring that back next.

from gluonts.

timoschowski avatar timoschowski commented on June 9, 2024

@kashif @lostella I mentioned this to you some time ago and @jgasthaus FYI

from gluonts.

timoschowski avatar timoschowski commented on June 9, 2024

adding some thoughts here. After a suggestion by @kashif I also tried running the notebook with

!pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

which gives me torch version:
'2.3.0.dev20240219'

however results are the same.

from gluonts.

timoschowski avatar timoschowski commented on June 9, 2024

one thing I noted is that changing context_length from the default prediction_length to 2*prediction_length has a substantial benefit here....

from gluonts.

lostella avatar lostella commented on June 9, 2024

for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?

No, this is an issue; we'll have to figure out what's wrong with recent lightning versions and make sure that everything runs smoothly. Also, given that setting accelerator="cpu" did not work makes me think this may not be a problem on Apple silicon only? Running the same on Linux with recent versions of lightning would answer that

Did we change anything with the neg binomial implementation?

I don't think so: this is the history of changes, and @kashif's change is the only thing that happened. It's #2749 as was part of 0.13.0 already. It really seems like something weird is going on with training.

from gluonts.

timoschowski avatar timoschowski commented on June 9, 2024

Ok, I have troubles running the notebook in colab. @kashif is this something that you could take a look at? this is about loading the models, something seems to be broken there....

from gluonts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.