ego4d_talknet_asd's People
ego4d_talknet_asd's Issues
Possible bug in dataloder
In
Ego4d_TalkNet_ASD/dataLoader.py
Line 28 in ab9f345
Ego4d_TalkNet_ASD/dataLoader.py
Line 45 in ab9f345
Face crop augmentation
Hi @zcxu-eric ,
Could you provide some intuition behind the following code (see screenshot) for face crop augmentation in dataLoader.py
. Specifically, I don't understand what you achieve through lines 111 and 114. I couldn't find any such step in the original TalkNet repo (https://github.com/TaoRuijie/TalkNet-ASD) or any mention of it in the TalkNet/Ego4D paper.
Misalignment of audio-visual frames and labels
Hi,
There is a chance of misalignment between the AV frames and the labels in dataloader.py
due to the interpolation in
Ego4d_TalkNet_ASD/dataLoader.py
Line 158 in ab9f345
P.S. This is similar to #1 (comment).
Thanks,
Sagnik
Code crashes due to missing frames
Hi @zcxu-eric ,
The training code crashes because some frames in the json files in data/ego4d/bbox
are missing in data/video_imgs
(#2 (comment)). See screenshot attached. Does the training complete one epoch on your end?
Tracker Results
Hi,
I was wondering if the tracking results are available somewhere to run inference on EGO4D.
Thanks!
About video frames and labels correspondence
Thanks for the detailed code!
I have a question regarding how you process the video frames and labels of a given trackid. For example, given trackid: a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0. Its entry in active_speaker_train.csv is
a1055434-9e9b-4d69-bac3-374a39f801da:track_85:0 77 30.0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 1619
But the timestamps of the video frames are not continuous in the json file. The 77 frames correspond to timestamps: 1619-1677 & 1694-1711. So I assume the labels also correspond to these timestamps. However, in the dataloader,
track = [bbox[i] for i in range(int(data[-1]), int(data[-1])+int(data[1])) if i in bbox]
first retrieves 1619-1677 and 1694, 1695. Then interpolation is used to add the missing bboxes. So timestamps of these 77 frames are 1619-1694. But the labels don't match these timestamps. So I'd like to know if this might be an issue or not.
Thanks!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.