Giter Club home page Giter Club logo

Comments (7)

tianyu0207 avatar tianyu0207 commented on July 17, 2024

Can you kindly explain the process you followed for generating the i3d features of the shanghai tech dataset so that we can follow the same for other datasets and videos as well?

video frames from non-overlapping sliding windows (16 frames each) are passed through the I3D network; features
are extracted from the ‘Mix 5c’ network layer, that are then reshaped to 2048-D vectors

from rtfm.

GowthamGottimukkala avatar GowthamGottimukkala commented on July 17, 2024

First of all, I want to thank you for your work.
If I'm not wrong, the output from the 'Mix 5c' layer of the I3D network is generating 1024-D vectors for every n frames(=16).

  1. How do I reshape that to 2048D? (Add features generated using RGB and flow -> 1024+1024?)
  2. Also, the dimension of your .npy files for the Shanghai Tech dataset is (some k,10,2048). What does each dimension indicate here? The paper indicates that the proposed RTFM receives T*D feature matrix (2 dimension) for one video, so I didn't understand why the uploaded features were in 3 dimension
  3. Are the features generated for the shanghai tech dataset are only using RGB frames and not optical flow images in the given onedrive link?

Kindly explain these so that more of us can implement this. Thanks in advance

from rtfm.

tianyu0207 avatar tianyu0207 commented on July 17, 2024

First of all, I want to thank you for your work.
If I'm not wrong, the output from the 'Mix 5c' layer of the I3D network is generating 1024-D vectors for every n frames(=16).

  1. How do I reshape that to 2048D? (Add features generated using RGB and flow -> 1024+1024?)
  2. Also, the dimension of your .npy files for the Shanghai Tech dataset is (some k,10,2048). What does each dimension indicate here? The paper indicates that the proposed RTFM receives T*D feature matrix (2 dimension) for one video, so I didn't understand why the uploaded features were in 3 dimension
  3. Are the features generated for the shanghai tech dataset are only using RGB frames and not optical flow images in the given onedrive link?

Kindly explain these so that more of us can implement this. Thanks in advance

  1. Hi Please use the I3d network with Resnet 50 as the backbone to extract features.
  2. To be consistent with the previous works, we use 10-crop augmentation, hence, 10 represents each cropped frame and k represents the number of 16-frames clips.
  3. The generated feature only uses the RGB features.

from rtfm.

GowthamGottimukkala avatar GowthamGottimukkala commented on July 17, 2024

Thank you very much for clarifying this. It really helps.
In another issue in this repo, I found that we need to divide the video to 32 snippets that means for any given video I'll get 32*2048 features, so won't k be fixed as 32 and not variable as you mentioned in the second point? Sorry if I didn't understand it correctly.

from rtfm.

tianyu0207 avatar tianyu0207 commented on July 17, 2024

Thank you very much for clarifying this. It really helps.
In another issue in this repo, I found that we need to divide the video to 32 snippets that means for any given video I'll get 32*2048 features, so won't k be fixed as 32 and not variable as you mentioned in the second point? Sorry if I didn't understand it correctly.

Hi, the feature is first extracted based on every 16-frames using I3D. Therefore, k = total-frames/16. Then during training, we process each video into 32 segments using process_feat function in util.py. This is the same as the paper 'real-world anomaly detection in surveillance videos'.

from rtfm.

DungVo1507 avatar DungVo1507 commented on July 17, 2024

Hi @GowthamGottimukkala
I am also looking for a way to extract i3d features on my dataset. I don't know if you have succeeded in extracting i3d features or not? If successful can you guide me?
Thank you so much!

from rtfm.

ro1406 avatar ro1406 commented on July 17, 2024

Hey @GowthamGottimukkala @DungVo1507, did any of you find a way to successfully extract the features for your own datasets? If so, could you help guide me to do the same or point me to any helpful links that will help me do the same easily?

from rtfm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.