weirme / fcsn Goto Github PK

View Code? Open in Web Editor NEW

113.0 5.0 33.0 98 KB

A PyTorch reimplementation of FCSN in paper "Video Summarization Using Fully Convolutional Sequence Networks"

Python 100.00%

video-processing video-summarization eccv-2018 fcsn

fcsn's Introduction

Video_Summary_using_FCSN

A PyTorch reimplementation of FCSN in paper "Video Summarization Using Fully Convolutional Sequence Networks".

Paper Publication Info

Mrigank Rochan, Linwei Ye, and Yang Wang.
Video Summarization Using Fully Convolutional Networks.
European Conference on Computer Vision (ECCV), 2018

pdf link: http://openaccess.thecvf.com/content_ECCV_2018/papers/Mrigank_Rochan_Video_Summarization_Using_ECCV_2018_paper.pdf

Dataset

A TVSum dataset (downsampled to 320 frames per video) preprocessed by make_dataset.py is available here. There are 50 groups in this hdf5 file named video_1 to video_50 . Datasets in each group is as follows:

name	description
`length`	scalar, number of video frames
`feature`	shape (320, 1024)
`label`	shape (320, )
`change_points`	shape (n_segments, 2) stores begin and end of each segment
`n_frame_per_seg`	shape (n_segments, ) number of frames in each segment
`user_summary`	shape (20, length) summary from 20 users, each row is a binary vector

Train

First change the data_path in config.py to your own hdf5 data path. Then run

python train.py

Every 5 epoch, model parameters and predicted summaries will be saved in save_dir and score_dir respectively. Evaluation result (precision, recall and f-score) will be printed at the same time.

Generate Summary

To generate keyframes (images) and keyshots (video), run

python gen_summary.py --h5_path {your hdf5 dataset path} --json_path {json file path saved in score_dir} --data_root {root dir of tvsum dataset} --save_dir {where to save the generating summary} --bar

preview of score-bar (x-axis: frame index, y-axis: user's score, and columns in orange is selected keyshots):

fcsn's People

Contributors

Stargazers

Watchers

fcsn's Issues

How to do this in colab

Can you tell this how to perform this project in google colab, step by step please can you help.
I am not getting how to do in google colab.

googlenet eval mode

https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/make_dataset.py#L53-L56

Because googlenet is only for feature extraction, it should be in eval mode.

Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.

train on my own dataset

In the code, I use features of different dimensions, and the model will report a dimension error. I was wondering if I could change the model so that the model not only supports the unified 320 features

fcsn_dataset.h5

Can someone tell me what the content format of fcsn_dataset.h5 is like?

data shape problem

The original frame feature shape is [320,1024]

But the code https://github.com/weirme/Video_Summary_using_FCSN/blob/96b40851b7805afd1f1fc69f2beb5143d5727b4e/data_loader.py#L18
wants to reshape to [1024,320] directly.

Should it use transpose instead of reshape?

Thank you.

Pre-trained model

Could you please provide a pre-trained model for the same?

IndexError: index 202 is out of bounds for axis 0 with size 0

I tried running module gen_summary.py

Traceback (most recent call last):
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 112, in
gen_summary()
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 104, in gen_summary
get_keys(id)
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 50, in get_keys
keyshots.append(frames[i])
IndexError: index 202 is out of bounds for axis 0 with size 0

getting the above error.
pls help in solving .

Normalized value problem

https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/make_dataset.py#L49

These values are for imagenet dataset. Does it also fit the dataset we use here?

Is it able to summarize a custom video?

As described, is it able to summarize a custom video?

Any ideas about the structure of unsupervised SUM-FCN

After reading chapter 3.3 in FCSN several times, I can not figure out what exactly structure of the unsupervised part. Is that mean:

select Y frames: choose the top Y socres features with dimension:
batch * 2 * Y
apply a 1*1 conv to decode features above to reconstruct their orginal feature representations:
batch * 2 * Y -> batch * 10 * Y (shape of the output of conv8)
merge the input frame-level feature vectors of thess selected Y frames using skip connection:
batch * 1024 * Y -> batch * 10 * Y
and then added by the output of step 2
obtain final reconstructed features of the Y frames:
batch * 10 * Y -> batch * 1024 * Y

datasets

Why the dataset cannot be downloaded？

The architecture of the FCSN is different from the paper

hello,Are you the author of the paper?

Importance scores in the TVSum50 tsv file

Hello,
How should I interpret the importance scores in the tsv file of the original TVSum50 dataset?
Are they for each frame? If yes, what is the frame rate used?
What is the significance of a shot being of 2 seconds?

The data annotation file has importance scores for each video. The readme said that each shot is 2 seconds. Hence while going through the data for the 1st video (length 5 min 54 sec), the number of annotations provided was over 10000. I am not able to understand how the length of the video is related to the number of annotations. Multiplying each video duration with commonly used frame rates (24-30) doesn't help as well.

frame-level scores to keyframes problem

As in FCSN Table 1, they use this paper 1.3 to convert frame-level scores to keyframes.

But you use this method to get keyframe which seems not identical to the FCSN paper.

Interval problem

https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/eval.py#L24

The [start:end] operator excludes the end element

should it be
pred_value = np.array([pred_score[cp[0]:(cp[1]+1)].mean() for cp in cps]) ?

Also
https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/eval.py#L29

Can you provide‘.h5’ files under three settings of dataset?

Is the model same as discussed in the paper??

In the paper for the final layer a tensor of shape 1TC is output where c is class so ouput tensor should be 320 x 2 for 2 classes ?? Can you clarify this??

A mistake in train.py line 48

the code is:

log_p = torch.log_softmax(pred_score, dim=1).reshape(-1, n_class)

where "pred_score" is a (n_batch, n_class, n_frame) tensor. then, doing log_softmax on it and reshape it in (-1,n_class). However, function "reshape" default in "Row first" mode, and we need "Col frist" mode here. the right code is:

log_p = torch.log_softmax(pred_score, dim=1).permute(0,2,1).reshape(-1, n_class)

How can get change points using KTS?

I tried to get change points using KTS code.
But i couldn't get proper change points.

If someone get change points using KTS, please help me?

training_set and testing_set of tvsum

As mention in the paper, the training and testing set should be 80% and 20%.
But in
https://github.com/weirme/Video_Summary_using_FCSN/blob/96b40851b7805afd1f1fc69f2beb5143d5727b4e/data_loader.py#L25

should it be train_dataset, test_dataset = torch.utils.data.random_split(dataset, [int(len(dataset)*0.8), int(len(dataset)*0.2)])?

Thank you.

provide data/files (ydata-tvsum50-v1_1)

can you please provide me with this data file
(ydata-tvsum50-v1_1)
I could not find it in the internet also in your data file.
also can you please provide me with the data root which I think it is the file with the name (TVSum) ... I did not find them also can you please give the link of the data here in a comment or send them to me by email: [email protected]
I am so thankful for your help.

Could you share the SumMe dataset on your google drive?

Could you share the SumMe dataset on your google drive? I need Original Video,not h5 file.
I can‘t find SumMe dataset online,thank you very much!

Broken Link in Readme

This link doesn't seem to work,

http://www.cs.umanitoba.ca/~mrochan/projects/eccv18/fcsn.html

F-score

Hello, your code is not complete, your test code does not get the accuracy rate and the recall rate.

index 5866 is out of bounds for axis 0 with size 5846

Traceback (most recent call last):
File "gen_summary.py", line 112, in
gen_summary()
File "gen_summary.py", line 104, in gen_summary
get_keys(id)
File "gen_summary.py", line 50, in get_keys
keyshots.append(frames[i])
IndexError: index 5866 is out of bounds for axis 0 with size 5846

Any idea how to solve this?

How test this on single video?

Hi.
Thanks for sharing your code.
Could you help me with testing this code on single video?
I appreciate your help in advance.

gen_summary failed with IndexError

Hello @weirme ,

Thank you for the great implementation.

I tried to use gen_summary.py to generate summaries for tvsum videos but failed. I used the default settings in the code and laid the dataset accordingly, but an IndexError is thrown.

I found that it is because the video IDs are not used right. The IDs in the original TVSum dataset are random names, but in your case ids are 1-50. So do you have a mapping between the two?

Thanks.

How can I summary a custom video?

As described, How can I test on a custom video and get the summary?

Various length input experiment

Has anyone ever tried various length input test?

input feature (1 x T x 1024) then,
T is 4494 or 1234 or whatever.. (the number of each video frame)

I tried this setting but the NLL Loss is not reduced...

unable 同openobject（object ‘video_-esJrBWj2d8’ doesn't exit）

I get this error when running make_dataset.py, what should I do?

Hello，Can you send me a dataset?

Hello，
I come from China, I can not view your google link，Can you send me a dataset to my email?
my email is [email protected]
Thank you very much!

implementation of get_oracle_summary function

https://github.com/weirme/Video_Summary_using_FCSN/blob/0895cccbb2a488369b1bfc7d2c087b3050250898/make_dataset.py#L70

What is the meaning of this function?

Hello,can you tell me the concrete structure of the unsupervised FCSN?

KeyError: "Unable to open object (object 'video_tensor(41)' doesn't exist)"

I only modified the address of the dataset

Traceback (most recent call last):
File "train.py", line 142, in
solver.train()
File "train.py", line 60, in train
for batch_i, (feature, label, ) in enumerate(tqdm(self.train_loader, desc='Batch', ncols=80, leave=False)):
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/tqdm/std.py", line 1129, in iter
for obj in iterable:
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in getitem
return self.dataset[self.indices[idx]]
File "/home/student/maruidi/Frames/Video_Summary_using_FCSN-master/data_loader.py", line 17, in getitem
video = self.data_file['video'+str(index)]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'video_tensor(41)' doesn't exist)"

Implementation of Reconstruction and Diversity loss

Could you please point me to the implementation of Reconstruction and Diversity losses? Is there an option to reproduce the scores for your unsupervised model?