Hi, I am trying to extract the object features using your code in <a href="https://git

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, I have the same question, but I found a code <a href="https://github.com/KaiyangZh

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

extract object features about msva HOT 6 OPEN

tibhannover commented on September 24, 2024

extract object features

from msva.

Comments (6)

StevRamos commented on September 24, 2024 3

@mpalaourg I almost made a big mistake in the Summe dataset. Thanks! Oh I get it. Did you try using other datasets? I am trying to use "CoSum" and "VSUMM", which I have read are also widely used. However, I don't know why they don't use it in this repository.

I will follow you to see your results when you finish it. Again, thank you very much.

from msva.

mpalaourg commented on September 24, 2024 2

@StevRamos Our methods for replicating the dataset are very similar (almost identical). I think that the extracted features are not the same with the given dataset (or KaiyangZhou's, where I also think they took them), either because of the different versions used (PyTorch etc.) or because the initial dataset (KaiyangZhou's) wasn't produced by using GoogleNet from PyTorch.

@xxhiao I didn't deal with the optical features, so I can't be of any help for the implementation. If I remember correctly the paper for the architecture used (the pretrained I3D (Inflated 3D ConvNet)) was this one , where they say on 2.5

We computed optical flow with a TV-L1 algorithm.

I think this is a good enough assumption, for the algorithm used.

from msva.

StevRamos commented on September 24, 2024

Hi, I have the same question, but I found a code https://github.com/KaiyangZhou/pytorch-vsumm-reinforce#readme. I think they took those processed videos from that repository. I found another repository https://github.com/SinDongHwan/pytorch-vsumm-reinforce/blob/master/utils/generate_dataset.py in which they tried to replicate the processed dataset (h5 files). I relied on the latter to try and replicate the data sets, you can find the code here https://github.com/StevRamos/video_summarization/tree/main/src. I also compared the shape of the dataset and the ones I made, it seems they are the same but the extracted feature is not the same (I also used google net from pytorch). Let me know if there is another idea that can help us replicate the dataset

from msva.

xxhiao commented on September 24, 2024

thank you very much, stevRamos. That's very helpful.

By the way, does anyone know know which optical flow algorithm that the authors of MSVA used to extract I3D features?

from msva.

StevRamos commented on September 24, 2024

@mpalaourg It is good to know that I am on the right track. How did you deal with the ground truth score? I averaged but I think there is one more step that I am ignoring that would help me a lot. If you have any ideas, I will be grateful if you share them with me. By the way, is your code public? Thanks in advance!

@xxhiao they used TV-L1 algorithm. I think they took 16 frames in a group but it depends on how many RAM you have. In my case, I couldn't replicate this but when I resampled I was able to do it.

from msva.

mpalaourg commented on September 24, 2024

@StevRamos the pipeline for the gtscore calculation is a bit different for each dataset.

Summe, given you have downloaded the files from here, you shall have access to a folder named GT with some *.mat files. Each of this file has a user_score matrix with shape (frames, annotators) and a gtscore vector with shape (frames, 1). Somewhere in their paper they say that they give the videos in random order (to avoid users selecting only frames in the beginning of the video), so the different numbers in user_score is just the order of the selected frames/shots and not importance. In MATLAB code, for a video you would want something like this:

  idxs = find(user_score > 0);
  my_user_score = user_score;
  my_user_score(idxs) = 1;
  my_gtscore = mean(my_user_score, 2);
  all(my_gtscore == gt_score)

Then, to get the right shape you have to sub-sampled to 2fps each video, where they wrongfully assumed each video is 30fps.

TVSum, given you have downloaded the files from here, you shall have access to a folder named ../ydata-tvsum50-data/data with an anno.tsv file. Here, the 3rd columns is indeed importance scores with a max value of 5. Normalize this 3rd column for each video, and again take the average.

Our code isn't public yet we are waiting for a double-blind review for our work, and then we will released it. Although, to be honest the data preparation code won't be released, because we also used the same data as this repo (different splits). The only reason I was playing with the data, was to fully understand them and used them correctly!

PS. If I didn't understand your question, feel free to ask again 😅

from msva.

extract object features about msva HOT 6 OPEN

Comments (6)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent