Giter Club home page Giter Club logo

Comments (6)

zcxu-eric avatar zcxu-eric commented on September 8, 2024

Hello,

Ego4D does not provide the same labels as AVA Active Speaker Detection dataset. Active speaker labels used here are generated from the VAD labels. The speaker who is speaking and has a visible face will be labeled as active speaker. But this may differ from AVA's annotation guideline. VAD results do not always align with the faces in the field of view. That's why we interpolate the faces.

These labels are also not strictly correct because we cannot verify each frame manually. Our motivation is to validate whether training on Ego4D dataset can improve the final diarization results for Ego4D AVD becnhmark.

from ego4d_talknet_asd.

SJTUwxz avatar SJTUwxz commented on September 8, 2024

Sorry for late reply! And thank you for the quick response! Sorry I might cause some confusions in the question. After reading your code, I found that in some cases, your dataloader will load the videos of one clip but use annotations of another clip.

For this video clip with trackid : a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0, the video clip that your dataloader loads (including interpolation) is of frame 1619-1694. And the labels of this clip that the dataloader uses belong to frame 1619-1677 and frame 1694-1711. There is a difference between video frames and labels.

from ego4d_talknet_asd.

zcxu-eric avatar zcxu-eric commented on September 8, 2024

Hello, I'm not sure what is the reason for this inconsistency. Is this the error from code or the preprocessed labels?

from ego4d_talknet_asd.

SJTUwxz avatar SJTUwxz commented on September 8, 2024

Hey! The preprocessed labels in the annotated bbox json file have inconsistent timestamps for some video clips. I don't think it's an error. But in the dataloader code it seems to assume that the timestamps are consistent. Thanks!

from ego4d_talknet_asd.

zcxu-eric avatar zcxu-eric commented on September 8, 2024

Hey! The preprocessed labels in the annotated bbox json file have inconsistent timestamps for some video clips. I don't think it's an error. But in the dataloader code it seems to assume that the timestamps are consistent. Thanks!

There are errors in the preprocessed labels. These are actually generated by ourselves cuz ego4d doesn't provide labels for active speaker detection. If this issue has side effect on your research, my suggestion is to regenerate these labels for ASD.

from ego4d_talknet_asd.

SJTUwxz avatar SJTUwxz commented on September 8, 2024

Hi! Thanks for your quick response and suggestion! I will check their paper and try to regenerate the labels.

from ego4d_talknet_asd.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.