Giter Club home page Giter Club logo

video_feature_extractor's Introduction

Fast and Easy to use video feature extractor

This repo aims at providing an easy to use and efficient code for extracting video features using deep CNN (2D or 3D).

It has been originally designed to extract video features for the large scale video dataset HowTo100M (https://www.di.ens.fr/willow/research/howto100m/) in an efficient manner.

Most of the time, extracting CNN features from video is cumbersome. In fact, this usually requires dumping video frames into the disk, loading the dumped frames one by one, pre processing them and use a CNN to extract features on chunks of videos. This process is not efficient because of the dumping of frames on disk which is slow and can use a lot of inodes when working with large dataset of videos.

To avoid having to do that, this repo provides a simple python script for that task: Just provide a list of raw videos and the script will take care of on the fly video decoding (with ffmpeg) and feature extraction using state-of-the-art models. While being fast, it also happen to be very convenient.

This script is also optimized for multi processing GPU feature extraction.

Requirements

How To Use ?

First of all you need to generate a csv containing the list of videos you want to process. For instance, if you have video1.mp4 and video2.webm to process, you will need to generate a csv of this form:

video_path,feature_path
absolute_path_video1.mp4,absolute_path_of_video1_features.npy
absolute_path_video2.webm,absolute_path_of_video2_features.npy

And then just simply run:

python extract.py --csv=input.csv --type=2d --batch_size=64 --num_decoding_thread=4

This command will extract 2d video feature for video1.mp4 (resp. video2.webm) at path_of_video1_features.npy (resp. path_of_video2_features.npy) in a form of a numpy array. To get feature from the 3d model instead, just change type argument 2d per 3d. The parameter --num_decoding_thread will set how many parallel cpu thread are used for the decoding of the videos.

Please note that the script is intended to be run on ONE single GPU only. if multiple gpu are available, please make sure that only one free GPU is set visible by the script with the CUDA_VISIBLE_DEVICES variable environnement for example.

Can I use multiple GPU to speed up feature extraction ?

Yes ! just run the same script with same input csv on another GPU (that can be from a different machine, provided that the disk to output the features is shared between the machines). The script will create a new feature extraction process that will only focus on processing the videos that have not been processed yet, without overlapping with the other extraction process already running.

What models are implemented ?

So far, only one 2D and one 3D models can be used.

  • The 2D model is the pytorch model zoo ResNet-152 pretrained on ImageNet. The 2D features are extracted at 1 feature per second at the resolution of 224.
  • The 3D model is a ResNexT-101 16 frames (https://github.com/kenshohara/3D-ResNets-PyTorch) pretrained on Kinetics. The 3D features are extracted at 1.5 feature per second at the resolution of 112.

Downloading pretrained models

This will download the pretrained 3D ResNext-101 model we used from: https://github.com/kenshohara/3D-ResNets-PyTorch

mkdir model
cd model
wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/models/resnext101.pth

Acknowledgements

The code re-used code from https://github.com/kenshohara/3D-ResNets-PyTorch for 3D CNN.

video_feature_extractor's People

Contributors

antoine77340 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video_feature_extractor's Issues

Can I extract 3D features per frame?

Thanks for the great work!
Regarding the question, I did try to change the frame rates within the files and it somehow worked with the 2D features, but for the 3D features, it still show result as per second. Can you show me which value I should change? Thanks!

I3D or C3D?

Can I ask you the 3d feature in your repo is c3d or i3d?
Thank you very much!!

Convert numpy to classes (label)

Sorry, how can I convert my numpy element to classes? Example: "dog" "cat"Sorry, how can I convert my numpy element to classes? Example: "dog" "cat"

Question about feature dimension

Great to find your repo, @antoine77340!

I have 3 questions for you to answer:

  1. The shape of my .npy file is (66, 2048), 66 is calculated as the total number of frames in the video divided by the batch size right?
  2. The output shape I want is (10, 66, 2048), 10 is 10 crops argumentation, can you tell me how to do it?
  3. If I want to change the resnet101 model to the resnet50 model, what components do I need to change in your repo?

Thank you so much!

please advice on the feature

Please advise on the feature extracted. I got (x,2048) and I pad them with the same length.
To run classification, do I need to change the dimension by adding channel? eg (x,2048,1)? or it doesn't necessary anymore.
(I'm still learning on this)

regarding requirements

hey @antoine77340 , thanks for this work it's really intersting .!
i am trying to run the code , it's not really working for me and i guess that the problem with the csv file i provided .
i think i have some issues with the content of it .
can you provide an example for the csv file ?
image

thank you

Decoding video - ffprobe failed at:

ffprobe failed at: /home/machine21/Desktop/video_code/video_feature_extractor/video/input.mp4

Do I need to make the resizing of video to 112:112 ratio? or any other reason for this failure?

Question about input

Hello, how's the inputs look like? Are those patches from the frames of the same target? Or take one of the frames from different targets ?

.

.

Running fine but feature file not there

Dear antoine,

i have run 2 sample of mp4 files, it run without error. But i dont see where the extracted files are. It is not in the feature folder or anywhere else. Pls advice
image

folder feature is empty. CSV file:

video_path feature_path
/content/drive/MyDrive/video_sample/ABI - 3.mp4 /content/drive/MyDrive/video_feature/ABI - 3_features.npy
/content/drive/MyDrive/video_sample/ABI - 5.mp4 /content/drive/MyDrive/video_feature/ABI - 5_features.npy

extract.py gives errors

hello , thank you for yr amazing work
i keep getting these errors , if u can enlighten me how i can solve it

Traceback (most recent call last):
File "/path /to/extract.py", line 7, in
from model import get_model
File "/path/to/model.py", line 4, in
from videocnn.models import resnext
ModuleNotFoundError: No module named 'videocnn'

multi view feature extraction

Hi @antoine77340

Thank you so much for uploading this code its really working good and helpful for me.

I need to extract feature from video with time-based how can I do that could you help me to do this problem

problem statement :
if you give 3 videos as input I want to extraction frames based on time, how this way extracted feature from video

Thank you

Features translation

Dear friend

I have to ask questions:

  1. Is it possible to translate the features values and scores to the classes as output json?
  2. Is it possible to catch these features at or with timestamps?

Please guide and help if this is possible.

Data doesn't store properly

Hi,
what is the shape of the data in npy file of every video?
I tried but it doesn't store in a proper format.
Can you guide me about it?
Screen Shot 2020-11-19 at 1 22 48 PM

Question on output dimension

Hi, great work there!
I noticed that the dimension of the output is not fixed and depends on video length(eg. some are 33x2048, some are 12x2048, etc)

What's the best way to get them to a single dim(eg. 1x2048)?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.