Giter Club home page Giter Club logo

scdm's Introduction

SCDM

Code for the paper: "Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos"

requirements

  • python 2.7
  • tensorflow 1.14.0
  • keras 1.2.1

Introduction

Temporal sentence grounding (TSG) in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence query. We propose a semantic conditioned dynamic modulation (SCDM) mechanism to help solve the TSG problem, which relies on the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence-related video contents over time.

Download Features and Example Preprocessed Data

First, download the following files into the './data' folder:

  • Extracted video features: charades_i3d_rgb.hdf5, activitynet_c3d_fc6_stride_1s.hdf5 (The video feature file is too big to upload and we have divided it into 10 parts, and you should download and merge the 10 parts into a whole feature file), tacos_c3d_fc6_nonoverlap.hdf5
  • For glove word embeddings used in our work, please download glove.840B.300d.zip, and preprocess the word embedding .txt file to a glove.840B.300d_dict.npy file, making it a dict whose key is a word and the corresponding value is the 300-d word embedding.

Data Preprocessing

As denoted in our paper, we perform the temporal sentence grounding task in three datasets: Charades-STA, ActivityNet Captions, and TACoS. Before the model training and testing in these three datasets, please preprocess the data first.

  • Go to the './grounding/Charades-STA/data_preparation/' folder, and run:
python generate_charades_data.py

Preprocessed data will be put into the './data/Charades/h5py/' folder.

  • Go to the './grounding/TACOS/data_preparation/' folder, and run:
python generate_tacos_data.py

Preprocessed data for the TACoS dataset will be put into the './data/TACOS/h5py/' folder.

  • Go to the './grounding/ActivityNet/data_preparation/' folder, and run:
python generate_anet_data.py

Preprocessed data for the ActivityNet Captions dataset will be put into the './data/ActivityNet/h5py/' folder.

Model Training and Testing

  • For the Charades-STA dataset, the proposed model and all its variant models are provided. For example, the proposed SCDM model implementation is in the './grounding/Charades-STA/src_SCDM' folder, run:
python run_charades_scdm.py --task train

for model training, and run:

python run_charades_scdm.py --task test

for model testing. Other variant models are similar to train and test.

  • For the TACoS and ActivityNet Captions dataset, we only provide the proposed SCDM model implementation in the './grounding/xxx/src_SCDM' folder. The training and testing process are similar to the Charades-STA dataset.
  • Please train our provided models from scratch, and you can reproduce the results in the paper (not exactly the same, but almost).

Citation

@inproceedings{yuan2019semantic,
  title={Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos},
  author={Yuan, Yitian and Ma, Lin and Wang, Jingwen and Liu, Wei and Zhu, Wenwu},
  booktitle={Advances in Neural Information Processing Systems},
  pages={534--544},
  year={2019}
}

scdm's People

Contributors

yytzsy avatar

Stargazers

uudark avatar  avatar Valeria avatar W Han avatar Wang-MMM-Lab avatar Xinwei Long avatar Renjie Liang avatar  avatar Jinyeong chae avatar Tesla avatar XiaoLeiLiu  avatar Fangzhou Mu avatar yee_0217 avatar Ziyi Wu avatar  avatar  avatar ZiyueWu avatar Hirano avatar  avatar  avatar  avatar Zhicheng Guo avatar  avatar Jie Lei 雷杰 avatar  avatar 宋建成 avatar  avatar LudanRuan avatar jasper777 avatar Hanxi Lin avatar FrappuChanee avatar Tsu-Jui Fu avatar  avatar Emad Alghamdi avatar  avatar  avatar Daizong Liu avatar Pratik Bhandari avatar wangwen avatar  avatar Ding Li avatar Farley Lai avatar Benjamin Zhang avatar  avatar  avatar  avatar Tanwey avatar Gopal Krishna avatar Zhixiang Wang avatar Arka Sadhu avatar Daqing Liu avatar Linchao Zhu avatar Slice avatar Xiao Wang(王逍) avatar 爱可可-爱生活 avatar Zhenfang Chen avatar Bill Cai avatar zhangzhaoyu avatar  avatar Jingwen Wang avatar Vikas Raunak avatar iworldt avatar Yaya Shi avatar  avatar Songyang Zhang avatar Xiaolong Liu avatar citibank avatar YijunSong avatar

Watchers

James Cloos avatar  avatar  avatar

scdm's Issues

I3D extractor

Hi Yitian,

I need to more precise I3D features(higher fps).Could you please share the codes for I3D extractor?
Thanks

Training equipment

Hello, very meaningful work. How many GPUs are needed for the training? How long is it altogether?

cannot download h5 data

Hi Yitian, downloading your extracted features need permission. Could you make it public available? Thanks.

Some thing about the data generations

Hi,
I am trying to use your code to generate training and testing data of Tacos, but the "timestamps" and "duration" in diractionary of three JSON files confused me. It means video frames not time, right? Then in Function of get_ground_truth_position(ground_postion) in generate_tacos_data.py, you calculate left_position and right_positition as integers. If the left_position and right_position mean time, they are not necessarily integers. So how to undersand this?

Thanks!

cannot download the extracted video features

Hello, when I tried to download the video feature of Google Drive provided by you, it prompted that the download requires access permission. Can you fix the downloading of video features, I would be very grateful!

Questions about the pretrained model for ActivityNet

Hi, Yitian, I loaded the pretrained checkpoint of ActivityNet dataset but failed. The vocab size in the checkpoint is about 14000, while the word2ix.npy generated by train() is about 21000.
Do you know what's the problem of it or how to get the corresponding vocabulary?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.