Giter Club home page Giter Club logo

Comments (10)

yixuanli98 avatar yixuanli98 commented on August 13, 2024 1

Yes, the Center Branch uses the focal loss and can handle multi-label classification.

from moc-detector.

nthhiep avatar nthhiep commented on August 13, 2024

I have some questions related to flip_test mode.

  1. In "normal_moc_det.py"/preprocess(), line 62, why do you convert the red channel of "flip_data". What does this mean?
    temp[:, :, 2] = 255 - temp[:, :, 2]

  2. In "normal_moc_det.py"/process() function, why don't you take the average of rgb_mov and rgb_mov_f (as well as flow_mov and flow_mov_f) like heatmap and wh output (lines 88,89, 100,101) ?

  3. rgb_output[1]['mov'], flow_output[1]['mov'] are computed for nothing?

It's the same for stream_moc_det.py. I hope to get your explanation. Thank you for your reply.

from moc-detector.

ArchiZX avatar ArchiZX commented on August 13, 2024
  1. This is a specific channel for brox-flow rather than the red channel.

  2. I tried, but this had no use.

  3. Yes.

from moc-detector.

nthhiep avatar nthhiep commented on August 13, 2024

Thank you for your response. I have another question. In fact, the format of ground-truth tubes in "UCF101v2-GT.pkl" is as follows:


gttubes  = { 
         'parentfolder/videoname': {class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])      ]}

         ...

         'parentfolder/videoname': {class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])      ]}
}

Here, the datasets are single-object? Each video contain only one action? And the class/identification of tubes is the class of videos (or the index of the parent folder's name)?
In theory, your model is multi-object tracking, but it is trained by single-object data?

How about the general problem where there are multiple objects or multiple actions of different types in videos? For example:

  1. video with 2 people jumping -> need to identify, or need to separate the tube boxes of each one
  2. video with one jumping, one walking -> need to classify as normal

In this case, there exists only the class/identification for tubes, not for videos? And _gttubes[label] must be a dictionary of multiple elements, as follows?

gttubes  = { 
         'parentfolder/videoname': {  class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]

                                    class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]      
                                    ...}
         ...

         'parentfolder/videoname': {  class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]]) ]

                                    class: [
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])
                  ...
                  array([[frame,x1,y1,x2,y2],...,[frame,x1,y1,x2,y2]])]      
                                    ...}
}

Many thanks,

from moc-detector.

nthhiep avatar nthhiep commented on August 13, 2024

Oh, flow-images are represented by HSV format, where the 0-channel means the direction and the 2-channel means the magnitude of the movement? So, when we flip images, we have to flip the direction of object movement. Thanks for the information, I forgot that.

from moc-detector.

ArchiZX avatar ArchiZX commented on August 13, 2024

UCF101-24 is a multi-objects dataset but JHMDB-21 is a single-object dataset. (see our gifs)

According to my observation, both datasets are single-action as you declare.

I don't know the generalization performance for multi-actions. And indeed, the community demand for a newly large non-atomic multi-actions/multi-objects action detection dataset.

from moc-detector.

nthhiep avatar nthhiep commented on August 13, 2024

I checked in UCF101v2-GT.pkl and found that UCF101-24 is not only a single-action but also single-object dataset. In every video, only one object is annotated with box during the video (dispite the video may contain many objects). So, UCF101-24 is a single-object tracking dataset.

We have 
len(self._gttubes[v]) = 1 for every video v in  in self._gttubes

The action tube can be interrupted, or it is divided in many segments. For example:


'Basketball/v_Basketball_g18_c02': {0: [array([
	   [  1., 161., 137., 222., 235.],
       [  2., 161., 137., 222., 235.],
       [  3., 161., 137., 222., 235.],
       [  4., 161., 137., 222., 235.],
       [  5., 161., 137., 222., 235.],
       [  6., 161., 137., 222., 235.],
       [  7., 161., 137., 222., 235.],
       [  8., 161., 137., 222., 235.],
       [  9., 161., 137., 222., 235.],
       [ 10., 162., 137., 223., 235.],
       [ 11., 162., 137., 223., 235.],
       [ 12., 163., 137., 224., 235.],
       [ 13., 163., 137., 224., 235.],
       [ 14., 163., 137., 224., 235.],
       [ 15., 163., 137., 224., 235.],
       [ 16., 163., 137., 224., 235.],
       [ 17., 163., 137., 224., 235.],
       [ 18., 163., 137., 224., 235.],
       [ 19., 163., 137., 224., 235.],
       [ 20., 163., 137., 224., 235.]], dtype=float32), array([[ 72., 163., 146., 219., 238.],
       [ 73., 163., 146., 219., 238.],
       [ 74., 163., 146., 219., 238.],
       [ 75., 163., 146., 219., 238.],
       [ 76., 163., 146., 219., 238.],
       [ 77., 163., 146., 219., 238.],
       [ 78., 163., 146., 219., 238.],
       [ 79., 163., 146., 219., 238.],
       [ 80., 163., 146., 219., 238.],
       [ 81., 163., 146., 219., 238.],
       [ 82., 163., 146., 219., 238.],
       [ 83., 163., 146., 219., 238.],
       [ 84., 163., 146., 219., 238.],
       [ 85., 163., 146., 219., 238.],
       [ 86., 163., 146., 219., 238.],
       [ 87., 163., 146., 219., 238.],
       [ 88., 163., 146., 219., 238.],
       [ 89., 163., 146., 219., 238.],
       [ 90., 163., 146., 219., 238.],
       [ 91., 163., 146., 219., 238.],
       [ 92., 163., 146., 219., 238.],
       [ 93., 163., 146., 219., 238.],
       [ 94., 163., 146., 219., 238.],
       [ 95., 163., 146., 219., 238.],
       [ 96., 163., 146., 219., 238.],
       [ 97., 163., 146., 219., 238.],
       [ 98., 163., 146., 219., 238.],
       [ 99., 163., 146., 219., 238.],
       [100., 163., 146., 219., 238.],
       [101., 163., 146., 219., 238.],
       [102., 163., 146., 219., 238.]], dtype=float32)]}

There are two tube segments in "Basketball/v_Basketball_g18_c02" video. However, the objects in the tubes are the same.
So, the UCF101-24 is is single-object tracking dataset.

from moc-detector.

ArchiZX avatar ArchiZX commented on August 13, 2024

gttubes : dictionary that contains the gt tubes for each video.
Gttubes are dictionaries that associate from each index of label, a list of tubes.
A tube is a numpy array with nframes rows and 5 columns, .

len(self._gttubes[v]) = 1 represents single-action rather than single-object.

And try to check len(self._gttubes[v][class_index])

For example, len(pkl['gttubes']['Fencing/v_Fencing_g04_c03'][6]) ---> 4

image

from moc-detector.

nthhiep avatar nthhiep commented on August 13, 2024

Thank you very much for this example. I'm wrong. I drawn the boxes for Basketball/v_Basketball_g18_c02 and think that it's the same for other videos. Thanks again.

from moc-detector.

xjsxujingsong avatar xjsxujingsong commented on August 13, 2024

Hi just found this issue. Can the proposed method support multi person multi action in one frame? such as in ava dataset?

from moc-detector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.