Giter Club home page Giter Club logo

Comments (4)

os1a avatar os1a commented on September 27, 2024

Hi,
Just to clarify something, figure 3 is only used to explain how the optimization of our loss function works.

The three ground truths are not available at the same time. You will see only one ground truth at every iteration, but during training you will see all of them.

The figure 3 explains only the sampling framework where we have EWTA. For simplicity, you can assume the simpler version of our approach where in the sampling network you generate multiple hypotheses (a set of points (x,y)) and then during fitting, you fit those hypotheses into your final mixture model.

In practice, we train the sampling network to generate 20 hypotheses and then fit them into 4 modes (as mentioned in section 6.1).

To get an idea about the EWTA loss implementation, we have already provided the code for the loss function.

def make_sampling_loss(self, hyps, bounded_log_sigmas, gt, mode='epe', top_n=1):

We also provided the loss function used in the fitting network (nll) at:

def make_fitting_loss(self, means, bounded_log_sigmas, mixture_weights, gt):

Feel free to raise more questions if you still need help.

from multimodal-future-prediction.

droneRL2020 avatar droneRL2020 commented on September 27, 2024

Thank you for your explanation!

To confirm, can you please elaborate this sentence?
"The three ground truths are not available at the same time. You will see only one ground truth at every iteration, but during training you will see all of them."

Does it mean, we use 3 ground truth labels(3 future trajectories(x,y position data on image)) paired with one image when we are training???

from multimodal-future-prediction.

os1a avatar os1a commented on September 27, 2024

No, we use only one ground truth. Every training sample has an input (e.g, image) and a single ground truth. We generate multiple hypotheses (e.g, 8 or 20) and use the EWTA loss function (make_sampling_loss() in our repository) which takes a set of hypotheses (hyps) and a single ground truth (gt).

What we mean by figure 3 is that during training, for some iteration we see an image with its single ground truth and for another iteration (maybe after a long time) the network sees a similar input image with a different ground truth. The EWTA loss function will encourage the network to use one head in the first case while using another head in the latter case.

from multimodal-future-prediction.

droneRL2020 avatar droneRL2020 commented on September 27, 2024

Thank you for the explanation!

from multimodal-future-prediction.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.