Giter Club home page Giter Club logo

ego4d_talknet_asd's People

Contributors

zcxu-eric avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ego4d_talknet_asd's Issues

Possible bug in dataloder

In

audioSet[dataName] = normalize(audio)
, the audio is downscaled to [-1, 1] and then normalized to produce a certain RMS. However, when
return audio.astype(numpy.int16)
in the overlap function is probabilistically executed, the downscaled audio datatype is changed from float32 to int16, which could lead most of the audio samples being mapped to very small integers, like [0, 1, -1, ...]. I don't see the need to change the type from int16 to float32 in the overlap function. The official TalkNet code (https://github.com/TaoRuijie/TalkNet-ASD/blob/main/dataLoader.py) does it, but it doesn't call any normalize function and directly works with the original int16 audio.

Face crop augmentation

Hi @zcxu-eric ,

Could you provide some intuition behind the following code (see screenshot) for face crop augmentation in dataLoader.py. Specifically, I don't understand what you achieve through lines 111 and 114. I couldn't find any such step in the original TalkNet repo (https://github.com/TaoRuijie/TalkNet-ASD) or any mention of it in the TalkNet/Ego4D paper.

Screenshot 2023-02-24 at 9 47 24 PM

Misalignment of audio-visual frames and labels

Hi,

There is a chance of misalignment between the AV frames and the labels in dataloader.py due to the interpolation in

if gt_frames > framenum.shape[0]:
. I think that there could be 2 ways to handle this: 1) have contiguous AV frames and do label interpolation, or 2) have discontiguous AV frames and not do label interpolation. Do you expect either of these options to work better?

P.S. This is similar to #1 (comment).

Thanks,
Sagnik

Tracker Results

Hi,

I was wondering if the tracking results are available somewhere to run inference on EGO4D.

Thanks!

About video frames and labels correspondence

Thanks for the detailed code!
I have a question regarding how you process the video frames and labels of a given trackid. For example, given trackid: a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0. Its entry in active_speaker_train.csv is

a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0 77 30.0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1619

But the timestamps of the video frames are not continuous in the json file. The 77 frames correspond to timestamps: 1619-1677 & 1694-1711. So I assume the labels also correspond to these timestamps. However, in the dataloader,

track = [bbox[i] for i in range(int(data[-1]), int(data[-1])+int(data[1])) if i in bbox]

first retrieves 1619-1677 and 1694, 1695. Then interpolation is used to add the missing bboxes. So timestamps of these 77 frames are 1619-1694. But the labels don't match these timestamps. So I'd like to know if this might be an issue or not.
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.