weirme / fcsn Goto Github PK
View Code? Open in Web Editor NEWA PyTorch reimplementation of FCSN in paper "Video Summarization Using Fully Convolutional Sequence Networks"
A PyTorch reimplementation of FCSN in paper "Video Summarization Using Fully Convolutional Sequence Networks"
Why the dataset cannot be downloaded?
I tried running module gen_summary.py
Traceback (most recent call last):
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 112, in
gen_summary()
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 104, in gen_summary
get_keys(id)
File "C:/Users/Sachin/Documents/MTech Dissertation/Video_Summary_using_FCSN/gen_summary.py", line 50, in get_keys
keyshots.append(frames[i])
IndexError: index 202 is out of bounds for axis 0 with size 0
getting the above error.
pls help in solving .
As mention in the paper, the training and testing set should be 80% and 20%.
But in
https://github.com/weirme/Video_Summary_using_FCSN/blob/96b40851b7805afd1f1fc69f2beb5143d5727b4e/data_loader.py#L25
should it be train_dataset, test_dataset = torch.utils.data.random_split(dataset, [int(len(dataset)*0.8), int(len(dataset)*0.2)])
?
Thank you.
In the code, I use features of different dimensions, and the model will report a dimension error. I was wondering if I could change the model so that the model not only supports the unified 320 features
I get this error when running make_dataset.py, what should I do?
Could you share the SumMe dataset on your google drive? I need Original Video,not h5 file.
I can‘t find SumMe dataset online,thank you very much!
Could you please provide a pre-trained model for the same?
Hi.
Thanks for sharing your code.
Could you help me with testing this code on single video?
I appreciate your help in advance.
Because googlenet is only for feature extraction, it should be in eval mode.
Can you provide‘.h5’ files under three settings of dataset?
Could you please point me to the implementation of Reconstruction and Diversity losses? Is there an option to reproduce the scores for your unsupervised model?
The original frame feature shape is [320,1024]
But the code https://github.com/weirme/Video_Summary_using_FCSN/blob/96b40851b7805afd1f1fc69f2beb5143d5727b4e/data_loader.py#L18
wants to reshape to [1024,320] directly.
Should it use transpose instead of reshape?
Thank you.
Can you tell this how to perform this project in google colab, step by step please can you help.
I am not getting how to do in google colab.
As in FCSN Table 1, they use this paper 1.3 to convert frame-level scores to keyframes.
But you use this method to get keyframe which seems not identical to the FCSN paper.
After reading chapter 3.3 in FCSN several times, I can not figure out what exactly structure of the unsupervised part. Is that mean:
batch * 2 * Y
batch * 2 * Y -> batch * 10 * Y (shape of the output of conv8)
batch * 1024 * Y -> batch * 10 * Y
batch * 10 * Y -> batch * 1024 * Y
Hello, your code is not complete, your test code does not get the accuracy rate and the recall rate.
The [start:end] operator excludes the end element
should it be
pred_value = np.array([pred_score[cp[0]:(cp[1]+1)].mean() for cp in cps])
?
Has anyone ever tried various length input test?
input feature (1 x T x 1024) then,
T is 4494 or 1234 or whatever.. (the number of each video frame)
I tried this setting but the NLL Loss is not reduced...
Can someone tell me what the content format of fcsn_dataset.h5 is like?
In the paper for the final layer a tensor of shape 1TC is output where c is class so ouput tensor should be 320 x 2 for 2 classes ?? Can you clarify this??
Hello,
I come from China, I can not view your google link,Can you send me a dataset to my email?
my email is [email protected]
Thank you very much!
the code is:
log_p = torch.log_softmax(pred_score, dim=1).reshape(-1, n_class)
where "pred_score" is a (n_batch, n_class, n_frame) tensor. then, doing log_softmax on it and reshape it in (-1,n_class). However, function "reshape" default in "Row first" mode, and we need "Col frist" mode here. the right code is:
log_p = torch.log_softmax(pred_score, dim=1).permute(0,2,1).reshape(-1, n_class)
As described, How can I test on a custom video and get the summary?
can you please provide me with this data file
(ydata-tvsum50-v1_1)
I could not find it in the internet also in your data file.
also can you please provide me with the data root which I think it is the file with the name (TVSum) ... I did not find them also can you please give the link of the data here in a comment or send them to me by email: [email protected]
I am so thankful for your help.
I tried to get change points using KTS code.
But i couldn't get proper change points.
If someone get change points using KTS, please help me?
These values are for imagenet dataset. Does it also fit the dataset we use here?
Traceback (most recent call last):
File "gen_summary.py", line 112, in
gen_summary()
File "gen_summary.py", line 104, in gen_summary
get_keys(id)
File "gen_summary.py", line 50, in get_keys
keyshots.append(frames[i])
IndexError: index 5866 is out of bounds for axis 0 with size 5846
Any idea how to solve this?
Hello @weirme ,
Thank you for the great implementation.
I tried to use gen_summary.py to generate summaries for tvsum videos but failed. I used the default settings in the code and laid the dataset accordingly, but an IndexError is thrown.
I found that it is because the video IDs are not used right. The IDs in the original TVSum dataset are random names, but in your case ids are 1-50. So do you have a mapping between the two?
Thanks.
Hello,
How should I interpret the importance scores in the tsv file of the original TVSum50 dataset?
Are they for each frame? If yes, what is the frame rate used?
What is the significance of a shot being of 2 seconds?
The data annotation file has importance scores for each video. The readme said that each shot is 2 seconds. Hence while going through the data for the 1st video (length 5 min 54 sec), the number of annotations provided was over 10000. I am not able to understand how the length of the video is related to the number of annotations. Multiplying each video duration with commonly used frame rates (24-30) doesn't help as well.
What is the meaning of this function?
As described, is it able to summarize a custom video?
This link doesn't seem to work,
http://www.cs.umanitoba.ca/~mrochan/projects/eccv18/fcsn.html
I only modified the address of the dataset
Traceback (most recent call last):
File "train.py", line 142, in
solver.train()
File "train.py", line 60, in train
for batch_i, (feature, label, ) in enumerate(tqdm(self.train_loader, desc='Batch', ncols=80, leave=False)):
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/tqdm/std.py", line 1129, in iter
for obj in iterable:
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 615, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in getitem
return self.dataset[self.indices[idx]]
File "/home/student/maruidi/Frames/Video_Summary_using_FCSN-master/data_loader.py", line 17, in getitem
video = self.data_file['video'+str(index)]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/student/maruidi/anaconda2/envs/FCSN/lib/python3.6/site-packages/h5py/_hl/group.py", line 264, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'video_tensor(41)' doesn't exist)"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.