zcxu-eric / ego4d_talknet_asd Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 9.0 60 KB

Python 99.22% Shell 0.78%

ego4d_talknet_asd's People

Contributors

Stargazers

Watchers

Forkers

jaedukseo sjtuwxz nnnnai sivannavis hars-singh sagnikmjr crazycoderli fuankarion

ego4d_talknet_asd's Issues

Possible bug in dataloder

Ego4d_TalkNet_ASD/dataLoader.py

Line 28 in ab9f345

audioSet[dataName] = normalize(audio)

, the audio is downscaled to [-1, 1] and then normalized to produce a certain RMS. However, when

Ego4d_TalkNet_ASD/dataLoader.py

Line 45 in ab9f345

return audio.astype(numpy.int16)

in the overlap function is probabilistically executed, the downscaled audio datatype is changed from float32 to int16, which could lead most of the audio samples being mapped to very small integers, like [0, 1, -1, ...]. I don't see the need to change the type from int16 to float32 in the overlap function. The official TalkNet code (https://github.com/TaoRuijie/TalkNet-ASD/blob/main/dataLoader.py) does it, but it doesn't call any normalize function and directly works with the original int16 audio.

Face crop augmentation

Hi @zcxu-eric ,

Could you provide some intuition behind the following code (see screenshot) for face crop augmentation in dataLoader.py. Specifically, I don't understand what you achieve through lines 111 and 114. I couldn't find any such step in the original TalkNet repo (https://github.com/TaoRuijie/TalkNet-ASD) or any mention of it in the TalkNet/Ego4D paper.

Misalignment of audio-visual frames and labels

Hi,

There is a chance of misalignment between the AV frames and the labels in dataloader.py due to the interpolation in

Ego4d_TalkNet_ASD/dataLoader.py

Line 158 in ab9f345

if gt_frames > framenum.shape[0]:

. I think that there could be 2 ways to handle this: 1) have contiguous AV frames and do label interpolation, or 2) have discontiguous AV frames and not do label interpolation. Do you expect either of these options to work better?

P.S. This is similar to #1 (comment).

Thanks,
Sagnik

Code crashes due to missing frames

Hi @zcxu-eric ,

The training code crashes because some frames in the json files in data/ego4d/bbox are missing in data/video_imgs (#2 (comment)). See screenshot attached. Does the training complete one epoch on your end?

Tracker Results

Hi,

I was wondering if the tracking results are available somewhere to run inference on EGO4D.

Thanks!

About video frames and labels correspondence

Thanks for the detailed code!
I have a question regarding how you process the video frames and labels of a given trackid. For example, given trackid: a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0. Its entry in active_speaker_train.csv is

a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0 77 30.0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1619

But the timestamps of the video frames are not continuous in the json file. The 77 frames correspond to timestamps: 1619-1677 & 1694-1711. So I assume the labels also correspond to these timestamps. However, in the dataloader,

track = [bbox[i] for i in range(int(data[-1]), int(data[-1])+int(data[1])) if i in bbox]

first retrieves 1619-1677 and 1694, 1695. Then interpolation is used to add the missing bboxes. So timestamps of these 77 frames are 1619-1694. But the labels don't match these timestamps. So I'd like to know if this might be an issue or not.
Thanks!

zcxu-eric / ego4d_talknet_asd Goto Github PK

ego4d_talknet_asd's People

Contributors

Stargazers

Watchers

Forkers

ego4d_talknet_asd's Issues

Possible bug in dataloder

Face crop augmentation

Misalignment of audio-visual frames and labels

Code crashes due to missing frames

Tracker Results

About video frames and labels correspondence

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent