Giter Club home page Giter Club logo

Comments (8)

sdobber avatar sdobber commented on June 20, 2024

Hi @lorrp1 ,

sorry for the late reply.

The model gets as input a number of features from a time series, say from time t - poollength to time t. The task is to predict a future value t + horizon from these features. Following the performance guide of Flux, all features get assembled in a matrix, where the last dimension corresponds to the different timesteps.

To give the model the past time series up to some point that is currently observable for a certain point in time, the poollength parameter provides a window of poollength steps back in time for all the features. All this happens in the second dimension of input:
input[:,1,1,60] is a vector of all the feature values at the 60th timestep, and input[:,2,1,60] would be a vector of the feature values one step back from the 60th time step. (So for example input[:,2,1,61] == input[:,1,1,60] holds true.) If you want to include the time series from which the target is derived as a feature or not is up to you, but I see no reason not to supply the model with what is known up to the current point in time.

m.ConvLayer operates only on the first two dimensions, so for a fixed timepoint it should only be able to access the features for the current timestep and poollength steps back in time. It's output is then a time series with convlayersize new "features". This way, the model should not have access to future points in time, though I admit that I never checked that thoroughly.

The training now tries to optimize the model parameters so that the model output (seen as a time series) matches the target, which means that at time t, it should predict the target variable at time t + horizon.

What you mention about that the model basically just could output more or less the current point in time to get an almost perfect fit is a general problem in time-series forecasting and applies to all methods, which makes it a difficult problem.

Concerning your question number 3, I am not an expert on jingw2's forecasting setup, but it looks very different to what I am doing in my code. I am only interested in forecasting a short amount of time, and for a new forecast I have new data available I can feed to the model. So I use basically everything I have available up to a certain time to come up with a prediction, until the data is exhausted. On https://github.com/jingw2/demand_forecast, it looks like they train the model over a certain dataset, and then let it output forecasts for a longer period of time. I would try to inspect their code to see if one can get some hints about how they train, and how they forecast.

from fluxarchitectures.jl.

lorrp1 avatar lorrp1 commented on June 20, 2024

thank you for the explanation.
i think i have now understood how the model works.

but is there no easy way then to make the model accept a lower input data x in a🅱️c:x for a model that was trained initially for a:d:c:y with y > x? (i mean using pred = model(input)on a trained model

when i try: pred = model(input) with a smaller data length from the the one used to initially train the model i get:
DimensionMismatch("arrays could not be broadcast to a common size")

the Conv((in, poolsize) should be equal assuming there is enough "input data" for the pool size since there is not initialization of the data length in the model's initialization.
the error is in: m.RecurLayer (a) (i have used a = m.ConvLayer(x) to see if the error was in the conv or recurrent layer)

the convlayersize is also be the same so i dont really understand the error on the recurrent layer

edit: another issue would be how to know if it isnt just overfitting without a test sample or a validation one

from fluxarchitectures.jl.

sdobber avatar sdobber commented on June 20, 2024

With the way Flux treats recurrent layers, their hidden state gets initialized to the correct size for the input data the first time you call a layer. When the size changes (e.g. by changing from training to test data), you get the DimensionMismatch("arrays could not be broadcast to a common size") error. The solution is to call Flux.reset!(model) before changing the size of the input. I normally include that in the loss function (that was mentioned in the documentation at one point, but now it seems to have been removed):

    loss(x,y)= begin
      l = Flux.mse(model(x),y)
      Flux.reset!(model)
      return l
    end

An of course it is a good idea to have a training, validation and test data set. I just wanted to keep my code simple to focus on the networks, and not build a hole data handling structure around everything 😁

from fluxarchitectures.jl.

lorrp1 avatar lorrp1 commented on June 20, 2024

thank you again,
im going to add test/validation maybe even changing the adam while training.

from fluxarchitectures.jl.

lorrp1 avatar lorrp1 commented on June 20, 2024

It seems the models return nan every time the pool length is higher than 2/3

from fluxarchitectures.jl.

sdobber avatar sdobber commented on June 20, 2024

I tried LSTNet with poollength = 1, 2, 5, 10, 15, 20, 50, 100, and that all worked fine. The variable defines a number of timesteps, so any non-integer values doesn't make sense.

from fluxarchitectures.jl.

lorrp1 avatar lorrp1 commented on June 20, 2024

By 2/3 i meant 2 or 3, but its now working, i changed how the dataloader to read csv but i made a mistake.
do you think it would be enough to change the size of the output of the last dense (and the loss) to turn LSTnet it into a classifier?

from fluxarchitectures.jl.

sdobber avatar sdobber commented on June 20, 2024

Might be worth a try. For my use case classification never really worked out, so my experience with this is limited.

from fluxarchitectures.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.