Giter Club home page Giter Club logo

Comments (7)

os1a avatar os1a commented on September 26, 2024

Question 1: yes.

Question 2: yes.

Training the first stage is the most important part and would require much of the training time. In your case, you could train the step 2 for 30k and step 3 for 10k. The best is to see the training loss/ validation loss behavior. Note that you need to lower the learning rate when training step 3.

You mean the tahn function on top of the first layer of the fitting network. Yes I would use it.
I do not think I observe a negative loss.

from multimodal-future-prediction.

droneRL2020 avatar droneRL2020 commented on September 26, 2024

Thanks for your answers! I have more questions.

First, do you recommend for me to replace sampling network to pre-trained net like Resnet18?? If not, do you have certain reason?

Second question is, can I use batch normalization and Relu instead of Tanh and Dropout at fitting network? If not, do you have certain reason?

Third question is, if I set sampling network's the number of hypothesis as 40, is it 40 * 3 modes or 40 modes?

Last question is, after I train sampling network 10000* 5 times the loss goes down to around 17 but when it starts to train fitting network the loss goes up to very high number(like 8000) and doesn't decrease... Do you think I should train sampling network more????

from multimodal-future-prediction.

os1a avatar os1a commented on September 26, 2024

Hi,

  • In our task, using a pretrained ResNet would not help because the input are not single images. If your task could benefit from such a pretrained model, then yes you can do it.

  • Of course, using batch normalization and Relu will work but no sure if it brings improvements. You can try that and share with us the conclusion.

  • By setting the number of hypotheses to 40, it means that you are generating in the first network 40 different outputs. Then it depends on the number of modes you will set in the fitting network which determines the final number of modes your mixture model will have. The number of modes is determined by setting num_output at:

    predicted = tf_full_conn(intermediate_drop, name='predict_fc1', num_output=20 * 4)

    For example, if your sampling network generates 40 hypotheses and you want to fit them into 4 modes, then you set num_output=40*4

  • The magnitude of the NLL loss (used in the fitting) is different from the sampling loss (EWTA). Therefore, it is fine that they are not comparable. However, having a very high value for the NLL does not make sense. Also the loss in the fitting network will only decrease at the beginning of the training and then stays stable because training the fitting network while fixing the sampling one is not a hard task. Double check the assignments outputs of the fitting network to see if they are meaningful.

from multimodal-future-prediction.

droneRL2020 avatar droneRL2020 commented on September 26, 2024

Thank you for your explanation.
I want to confirm that the difference between modes and hypotheses in your thesis.
Two pictures below are one set of the result of your test.py code.
In the first picture, it seems like there are 20 hypotheses * 2outputs(x,y coordinate)
In the second picture of the result I can see 4 distributions.(Seems like 4 modes)
However, the sampling network output shape seems like(batch, 20 x 4) as below.

input_2 = tf.concat([hyps_concat, log_scales_concat], axis=1) # (batch, 20*4, 1, 1)

Also, the fitting network output shape seems like the same as sampling network output shape(batch, 20 x 4)
predicted = tf_full_conn(intermediate_drop, name='predict_fc1', num_output=20 * 4)

Originally, I understood as below though,
sampling network output shape : 20hypotheses * 2ouptuts(x,y) --> 20 green boxes in the first picture
fitting network output shape : 4modes(Make 4 distribution using 20 hypotheses) * 2outputs(x,y) --> 4 set of 2D(x,y) distributions.

Can you please let me know which part I'm missing???

0-hyps
image

from multimodal-future-prediction.

os1a avatar os1a commented on September 26, 2024

Hi,

The sampling network outputs 20 hypotheses, each is a unimodal distribution (2 for the mean and 2 for the sigma), thus the shape (20*4). This is what we call in the paper EWTAD. Note that the first picture shows only the means where we draw a set of bounding boxes of same size centered at the predicted means.

The fitting network outputs assignment vectors of shape (20*4) (referred as z_k in eq 6 in the paper). The fitting network takes as input the set of hypotheses generated from the sampling network and outputs for each hypothesis an assignment vector of shape (4) (number of modes). In other words, the fitting network computes the assignments of each hypothesis to the final mode. For example, if the first hypothesis should be assigned to the third mode, then the assignment of the first hypothesis will be (0,0,1,0) and so on. Note that these assignment vectors are between 0 and 1 and should sum to 1.

Then the function tf_assemble_lmm_parameters_independent_dist:

means, bounded_log_sigmas, mixture_weights = tf_assemble_lmm_parameters_independent_dists(samples_means=out_hyps,

takes these assignments and the input hypotheses (a set of indpendent unimodal distributions in the form of means and sigmas) and output the final multimodal mixture model.

from multimodal-future-prediction.

droneRL2020 avatar droneRL2020 commented on September 26, 2024

Thank you for your reply. Also, really really appreciate your quick reseponse!

I tried to understand your code and came up with more questions below.

  1. Here, when you are using tf.fill(), shouldn't we use shape as the first parameter?? You used diff2 and it seems like not the shape.. so

    diff2 = tf.add(diff2, tf.fill(diff2, eps))

  2. Shouldn't this be means[i] instead of hyps[i]

    diff2 = tf.square(gt - hyps[i]) # (batch,2,1,1)

  3. What are these nd.ops.mul I couldn't find that operation.

    sxsy = nd.ops.mul(sigma[:, 0:1, :, :], sigma[:, 1:2, :, :])

  4. Should I use this "out_hyps" and "out_log_sigmas" for make_sampling_loss input?
    In this net.py there was bounded_log_sigmas which comes out after fitting network, and the sampling loss function takes bounded_log_sigmas, so I wasn't sure which are the inputs for that sampling loss function

out_hyps, out_log_sigmas = self.disassembling(output)

def make_sampling_loss(self, hyps, bounded_log_sigmas, gt, mode='epe', top_n=1):

  1. I'm keep getting negative loss. As "b" gets smaller loss becomes negative. Do you think this is fine?
    Graph below is training steps from "iul(5000 iterations)" to fitting net(5000 iterations)
    Before this I trained sampling network(5 stages * 5000=25000 iterations)
    image

from multimodal-future-prediction.

os1a avatar os1a commented on September 26, 2024

Hi @droneRL2020

Sorry for being late to respond, we were quite busy with a deadline.

1- I think tf.fill can also take as first argument a tensor and will only use its dimensions (shape). Of course you can directly use the shape.

2- Yes, you are right. I will update it accordingly.

3- You can simply replace it with tf.multiply. I will update it as well.

4- Yes, the output of the self.disassembling(output) is out_hyps and out_log_sigmas where the out_log_sigmas are already the bounded_log_sigmas (see

bounded_log_sigmas = [tf_adjusted_sigmoid(log_sigmas[i], -6, 6) for i in range(len(log_sigmas))]
).

5- In my experiment, I do not have negative loss values. But I think it is not wrong to have negative ones. Just check the final mixture model parameters and see if it make sense (e.g, plot the mixture model distribution).

from multimodal-future-prediction.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.