Giter Club home page Giter Club logo

data-efficient-video-transformer's Introduction

PWC

Data-efficient-video-transformer

this repo is for menovideo associated with the paper 'Data Efficient Video Transformer for Violence Detection' (DeVTR)

one of big challenges facing researchers in computer vision with transformers especially in video tasks is the need for large data and high computational resources , our method called DeVTR (Data Efficient Video Transformer for Violence Detection) to overcame these challenges (he need for large data and high computational resources )

In this work, we propose a data-efficient video transformer (DeVTr) based on the transformer network as a Spatio-temporal learning method with a pre-trained 2d-Convolutional neural network (2d-CNN) as an embedding layer for the input data. The model has been trained and tested on the Real-life violence dataset (RLVS) and achieved an accuracy of 96.25%. A comparison of the result for the suggested method with previous techniques illustrated that the suggested method provides the best result among all the other studies for violence event detection.

Results and benchmarking

the model achieved 96.25% based on RLVS dataset and also worth to mention that it was better than TimeSformer in both memory efficiency and convergence speed and accuracy

Comparing results of DeVTr vs other methods based on RLVS Dataset

saliency map for random video of violence action

menvideo package

the menovideo package help you build video action recognation / video understanding model based on
1- build using our Novel model DeVTR with full costmaztion 2- video dataset reader and preprocessing to easly read videos and make them as pytorch ready dataloaders 3- Timedistributed warper similar to keras timedistributed warper which can help you easly build (classical CNN+LSTM )

this is new novel transformer network combined with Conv net to build a highly accuract video action recognation model with limited data and hw rescources

simple usage

install

pip install menovideo
 

import it

import menovideo.menovideo as menoformer
import menovideo.videopre as vide_reader 

init DeVTr model without pre-trained wights

model = menoformer.DeVTr()


init DeVTr with pre-trained wigths the trained wights can be downloaded from this url

wight = 'drive/MyDrive/Colab Notebooks/transformers/violance-detaction-myresearch/vg19bn40convtransformer-ep-0.pth'
model2 = menoformer.DeVTr(w= wight , base ='default')

using the video reader and pre-processing helpers parameters is :

  1. pandas dataframe contain the path and label of each video
  2. number of frames for the singal video
  3. RGB is the number of color channles
  4. h is the hieght of the frame for each video
  5. w is the width of the frame for each video
valid_dataset = vide_reader.TaskDataset(valid_df,timesep=time_stp,rgb=RGB,h=H,w=W)

for detlied example of using the labrary use package_test.ipynb

please use pytorch 1.9 for the pre-trained model

To cite our paper/code:


@INPROCEEDINGS{9530829,  author={Abdali, Almamon Rasool},  booktitle={2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)},   title={Data Efficient Video Transformer for Violence Detection},   year={2021},  volume={},  number={},  pages={195-199},  doi={10.1109/COMNETSAT53002.2021.9530829}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.