Hello, i have a few questions regarding the exmaples: 1)from what i understood the

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

question on how the examples work about fluxarchitectures.jl HOT 8 OPEN

sdobber commented on June 20, 2024

question on how the examples work

from fluxarchitectures.jl.

Comments (8)

sdobber commented on June 20, 2024

Hi @lorrp1 ,

sorry for the late reply.

The model gets as input a number of features from a time series, say from time t - poollength to time t. The task is to predict a future value t + horizon from these features. Following the performance guide of Flux, all features get assembled in a matrix, where the last dimension corresponds to the different timesteps.

To give the model the past time series up to some point that is currently observable for a certain point in time, the poollength parameter provides a window of poollength steps back in time for all the features. All this happens in the second dimension of input:
input[:,1,1,60] is a vector of all the feature values at the 60th timestep, and input[:,2,1,60] would be a vector of the feature values one step back from the 60th time step. (So for example input[:,2,1,61] == input[:,1,1,60] holds true.) If you want to include the time series from which the target is derived as a feature or not is up to you, but I see no reason not to supply the model with what is known up to the current point in time.

m.ConvLayer operates only on the first two dimensions, so for a fixed timepoint it should only be able to access the features for the current timestep and poollength steps back in time. It's output is then a time series with convlayersize new "features". This way, the model should not have access to future points in time, though I admit that I never checked that thoroughly.

The training now tries to optimize the model parameters so that the model output (seen as a time series) matches the target, which means that at time t, it should predict the target variable at time t + horizon.

What you mention about that the model basically just could output more or less the current point in time to get an almost perfect fit is a general problem in time-series forecasting and applies to all methods, which makes it a difficult problem.

Concerning your question number 3, I am not an expert on jingw2's forecasting setup, but it looks very different to what I am doing in my code. I am only interested in forecasting a short amount of time, and for a new forecast I have new data available I can feed to the model. So I use basically everything I have available up to a certain time to come up with a prediction, until the data is exhausted. On https://github.com/jingw2/demand_forecast, it looks like they train the model over a certain dataset, and then let it output forecasts for a longer period of time. I would try to inspect their code to see if one can get some hints about how they train, and how they forecast.

from fluxarchitectures.jl.

lorrp1 commented on June 20, 2024

thank you for the explanation.
i think i have now understood how the model works.

but is there no easy way then to make the model accept a lower input data x in a🅱️c:x for a model that was trained initially for a:d:c:y with y > x? (i mean using pred = model(input)on a trained model

when i try: pred = model(input) with a smaller data length from the the one used to initially train the model i get:
DimensionMismatch("arrays could not be broadcast to a common size")

the Conv((in, poolsize) should be equal assuming there is enough "input data" for the pool size since there is not initialization of the data length in the model's initialization.
the error is in: m.RecurLayer (a) (i have used a = m.ConvLayer(x) to see if the error was in the conv or recurrent layer)

the convlayersize is also be the same so i dont really understand the error on the recurrent layer

edit: another issue would be how to know if it isnt just overfitting without a test sample or a validation one

from fluxarchitectures.jl.

sdobber commented on June 20, 2024

With the way Flux treats recurrent layers, their hidden state gets initialized to the correct size for the input data the first time you call a layer. When the size changes (e.g. by changing from training to test data), you get the DimensionMismatch("arrays could not be broadcast to a common size") error. The solution is to call Flux.reset!(model) before changing the size of the input. I normally include that in the loss function (that was mentioned in the documentation at one point, but now it seems to have been removed):

    loss(x,y)= begin
      l = Flux.mse(model(x),y)
      Flux.reset!(model)
      return l
    end

An of course it is a good idea to have a training, validation and test data set. I just wanted to keep my code simple to focus on the networks, and not build a hole data handling structure around everything 😁

from fluxarchitectures.jl.

lorrp1 commented on June 20, 2024

thank you again,
im going to add test/validation maybe even changing the adam while training.

from fluxarchitectures.jl.

lorrp1 commented on June 20, 2024

It seems the models return nan every time the pool length is higher than 2/3

from fluxarchitectures.jl.

sdobber commented on June 20, 2024

I tried LSTNet with poollength = 1, 2, 5, 10, 15, 20, 50, 100, and that all worked fine. The variable defines a number of timesteps, so any non-integer values doesn't make sense.

from fluxarchitectures.jl.

lorrp1 commented on June 20, 2024

By 2/3 i meant 2 or 3, but its now working, i changed how the dataloader to read csv but i made a mistake.
do you think it would be enough to change the size of the output of the last dense (and the loss) to turn LSTnet it into a classifier?

from fluxarchitectures.jl.

sdobber commented on June 20, 2024

Might be worth a try. For my use case classification never really worked out, so my experience with this is limited.

from fluxarchitectures.jl.

question on how the examples work about fluxarchitectures.jl HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent