I have some questions related to flip_test mode. <p dir="auto

This is a specific channel for brox-flow rather than the red chan

Data processing about moc-detector HOT 10 CLOSED

mcg-nju commented on August 13, 2024

Data processing

from moc-detector.

Comments (10)

yixuanli98 commented on August 13, 2024 1

Yes, the Center Branch uses the focal loss and can handle multi-label classification.

from moc-detector.

nthhiep commented on August 13, 2024

I have some questions related to flip_test mode.

In "normal_moc_det.py"/preprocess(), line 62, why do you convert the red channel of "flip_data". What does this mean?
temp[:, :, 2] = 255 - temp[:, :, 2]
In "normal_moc_det.py"/process() function, why don't you take the average of rgb_mov and rgb_mov_f (as well as flow_mov and flow_mov_f) like heatmap and wh output (lines 88,89, 100,101) ?
rgb_output[1]['mov'], flow_output[1]['mov'] are computed for nothing?

It's the same for stream_moc_det.py. I hope to get your explanation. Thank you for your reply.

from moc-detector.

ArchiZX commented on August 13, 2024

This is a specific channel for brox-flow rather than the red channel.
I tried, but this had no use.
Yes.

from moc-detector.

nthhiep commented on August 13, 2024

Thank you for your response. I have another question. In fact, the format of ground-truth tubes in "UCF101v2-GT.pkl" is as follows:


gttubes  = { 
         'parentfolder/videoname': {class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])      ]}

         ...

         'parentfolder/videoname': {class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])      ]}
}

Here, the datasets are single-object? Each video contain only one action? And the class/identification of tubes is the class of videos (or the index of the parent folder's name)?
In theory, your model is multi-object tracking, but it is trained by single-object data?

How about the general problem where there are multiple objects or multiple actions of different types in videos? For example:

video with 2 people jumping -> need to identify, or need to separate the tube boxes of each one
video with one jumping, one walking -> need to classify as normal

In this case, there exists only the class/identification for tubes, not for videos? And _gttubes[label] must be a dictionary of multiple elements, as follows?

gttubes  = { 
         'parentfolder/videoname': {  class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]

                                    class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]      
                                    ...}
         ...

         'parentfolder/videoname': {  class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]

                                    class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]      
                                    ...}
}

Many thanks,

from moc-detector.

nthhiep commented on August 13, 2024

Oh, flow-images are represented by HSV format, where the 0-channel means the direction and the 2-channel means the magnitude of the movement? So, when we flip images, we have to flip the direction of object movement. Thanks for the information, I forgot that.

from moc-detector.

ArchiZX commented on August 13, 2024

UCF101-24 is a multi-objects dataset but JHMDB-21 is a single-object dataset. (see our gifs)

According to my observation, both datasets are single-action as you declare.

I don't know the generalization performance for multi-actions. And indeed, the community demand for a newly large non-atomic multi-actions/multi-objects action detection dataset.

from moc-detector.

nthhiep commented on August 13, 2024

I checked in UCF101v2-GT.pkl and found that UCF101-24 is not only a single-action but also single-object dataset. In every video, only one object is annotated with box during the video (dispite the video may contain many objects). So, UCF101-24 is a single-object tracking dataset.

We have 
len(self._gttubes[v]) = 1 for every video v in  in self._gttubes

The action tube can be interrupted, or it is divided in many segments. For example:


'Basketball/v_Basketball_g18_c02': {0: [array([
	   [  1., 161., 137., 222., 235.],
       [  2., 161., 137., 222., 235.],
       [  3., 161., 137., 222., 235.],
       [  4., 161., 137., 222., 235.],
       [  5., 161., 137., 222., 235.],
       [  6., 161., 137., 222., 235.],
       [  7., 161., 137., 222., 235.],
       [  8., 161., 137., 222., 235.],
       [  9., 161., 137., 222., 235.],
       [ 10., 162., 137., 223., 235.],
       [ 11., 162., 137., 223., 235.],
       [ 12., 163., 137., 224., 235.],
       [ 13., 163., 137., 224., 235.],
       [ 14., 163., 137., 224., 235.],
       [ 15., 163., 137., 224., 235.],
       [ 16., 163., 137., 224., 235.],
       [ 17., 163., 137., 224., 235.],
       [ 18., 163., 137., 224., 235.],
       [ 19., 163., 137., 224., 235.],
       [ 20., 163., 137., 224., 235.]], dtype=float32), array([[ 72., 163., 146., 219., 238.],
       [ 73., 163., 146., 219., 238.],
       [ 74., 163., 146., 219., 238.],
       [ 75., 163., 146., 219., 238.],
       [ 76., 163., 146., 219., 238.],
       [ 77., 163., 146., 219., 238.],
       [ 78., 163., 146., 219., 238.],
       [ 79., 163., 146., 219., 238.],
       [ 80., 163., 146., 219., 238.],
       [ 81., 163., 146., 219., 238.],
       [ 82., 163., 146., 219., 238.],
       [ 83., 163., 146., 219., 238.],
       [ 84., 163., 146., 219., 238.],
       [ 85., 163., 146., 219., 238.],
       [ 86., 163., 146., 219., 238.],
       [ 87., 163., 146., 219., 238.],
       [ 88., 163., 146., 219., 238.],
       [ 89., 163., 146., 219., 238.],
       [ 90., 163., 146., 219., 238.],
       [ 91., 163., 146., 219., 238.],
       [ 92., 163., 146., 219., 238.],
       [ 93., 163., 146., 219., 238.],
       [ 94., 163., 146., 219., 238.],
       [ 95., 163., 146., 219., 238.],
       [ 96., 163., 146., 219., 238.],
       [ 97., 163., 146., 219., 238.],
       [ 98., 163., 146., 219., 238.],
       [ 99., 163., 146., 219., 238.],
       [100., 163., 146., 219., 238.],
       [101., 163., 146., 219., 238.],
       [102., 163., 146., 219., 238.]], dtype=float32)]}

There are two tube segments in "Basketball/v_Basketball_g18_c02" video. However, the objects in the tubes are the same.
So, the UCF101-24 is is single-object tracking dataset.

from moc-detector.

ArchiZX commented on August 13, 2024

gttubes : dictionary that contains the gt tubes for each video.
Gttubes are dictionaries that associate from each index of label, a list of tubes.
A tube is a numpy array with nframes rows and 5 columns, .

len(self._gttubes[v]) = 1 represents single-action rather than single-object.

And try to check len(self._gttubes[v][class_index])

For example, len(pkl['gttubes']['Fencing/v_Fencing_g04_c03'][6]) ---> 4

from moc-detector.

nthhiep commented on August 13, 2024

Thank you very much for this example. I'm wrong. I drawn the boxes for Basketball/v_Basketball_g18_c02 and think that it's the same for other videos. Thanks again.

from moc-detector.

xjsxujingsong commented on August 13, 2024

Hi just found this issue. Can the proposed method support multi person multi action in one frame? such as in ava dataset?

from moc-detector.

Data processing about moc-detector HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent