Hello, first of all i want to thank you for this repo because it is the only one i fou

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Training fails on sample datasets. about fluxarchitectures.jl HOT 6 CLOSED

sdobber commented on June 21, 2024

Training fails on sample datasets.

from fluxarchitectures.jl.

Comments (6)

sdobber commented on June 21, 2024

Thanks for reporting, I'll have a look.

TPA-LSTM works fine for me for bigger datasets, but I've had similar issues with DSANet occasionally. The normalization layers sometimes lead to NaNs in the training loop, which makes the whole model output useless. Unfortunately, I've never gotten to the bottom of when and why exactly this happens.

from fluxarchitectures.jl.

lorrp1 commented on June 21, 2024

the other models give me problems as well, could it be because of different flux version?(im using flux "0.11.1" )
i cant install properly the version used in your manifest (errors during "pre-compiling")

im not getting error though, just straight lines or NaN32

from fluxarchitectures.jl.

sdobber commented on June 21, 2024

As far as I know, Flux 0.10 only works on Julia up to 1.4.2, that's why you can't use the version from the manifest files. When I update to Flux v0.11 and Julia 1.5.1, I can run all the files, but DSAnet gives NANs as you describe, and the training for the other models does not produce any usable results.

As for the latter, I think this might be related to some changes in recent Flux versions. There was an issue where training of recurrent neural networks was not handled properly, and there still seems to be some remaining bugs to be fleshed out (see e.g. FluxML/Flux.jl#1209 or FluxML/Flux.jl#1324). I would guess that the metaparameters in the example files (number of hidden layers etc.) are way off currently.

[Edit:] OK, now I'm having weird problems as well, with LSTnet throwing an error and DARNN running in some infinite loop. I seriously don't know yet what is causing this - I am using the same code in a bigger project where all models train fine...

from fluxarchitectures.jl.

lorrp1 commented on June 21, 2024

im trying with julia-1.4.2, i had some issue compiling flux, but it should be fine now
using @show LSTnet return me this:
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.6141138f30
Flux.mse(pred, target) = 1.589704f30
...
Flux.mse(pred, target) = 1.589704f30 (after some minutes)
and the chart is very off.

DSAnet return nan32.
DARNN may be working instead (im still running it, but it is very slow). edit: it is turning into a line. TPALSTM works (it turns into a line as written in the repo)

from fluxarchitectures.jl.

sdobber commented on June 21, 2024

@lorrp1 This is how far I got with fixing things. DSAnet is still broken, and I fear that the issue there is either rather complex or well hidden.

from fluxarchitectures.jl.

lorrp1 commented on June 21, 2024

hello @sdobber i have tested the last update:
DSAnet works sometimes
LSTnet/TPA/DARNN work now

from fluxarchitectures.jl.

Training fails on sample datasets. about fluxarchitectures.jl HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent