Giter Club home page Giter Club logo

Comments (6)

StevRamos avatar StevRamos commented on September 24, 2024 3

@mpalaourg I almost made a big mistake in the Summe dataset. Thanks! Oh I get it. Did you try using other datasets? I am trying to use "CoSum" and "VSUMM", which I have read are also widely used. However, I don't know why they don't use it in this repository.

I will follow you to see your results when you finish it. Again, thank you very much.

from msva.

mpalaourg avatar mpalaourg commented on September 24, 2024 2

@StevRamos Our methods for replicating the dataset are very similar (almost identical). I think that the extracted features are not the same with the given dataset (or KaiyangZhou's, where I also think they took them), either because of the different versions used (PyTorch etc.) or because the initial dataset (KaiyangZhou's) wasn't produced by using GoogleNet from PyTorch.

@xxhiao I didn't deal with the optical features, so I can't be of any help for the implementation. If I remember correctly the paper for the architecture used (the pretrained I3D (Inflated 3D ConvNet)) was this one , where they say on 2.5

We computed optical flow with a TV-L1 algorithm.

I think this is a good enough assumption, for the algorithm used.

from msva.

StevRamos avatar StevRamos commented on September 24, 2024

Hi, I have the same question, but I found a code https://github.com/KaiyangZhou/pytorch-vsumm-reinforce#readme. I think they took those processed videos from that repository. I found another repository https://github.com/SinDongHwan/pytorch-vsumm-reinforce/blob/master/utils/generate_dataset.py in which they tried to replicate the processed dataset (h5 files). I relied on the latter to try and replicate the data sets, you can find the code here https://github.com/StevRamos/video_summarization/tree/main/src. I also compared the shape of the dataset and the ones I made, it seems they are the same but the extracted feature is not the same (I also used google net from pytorch). Let me know if there is another idea that can help us replicate the dataset

from msva.

xxhiao avatar xxhiao commented on September 24, 2024

thank you very much, stevRamos. That's very helpful.

By the way, does anyone know know which optical flow algorithm that the authors of MSVA used to extract I3D features?

from msva.

StevRamos avatar StevRamos commented on September 24, 2024

@mpalaourg It is good to know that I am on the right track. How did you deal with the ground truth score? I averaged but I think there is one more step that I am ignoring that would help me a lot. If you have any ideas, I will be grateful if you share them with me. By the way, is your code public? Thanks in advance!

@xxhiao they used TV-L1 algorithm. I think they took 16 frames in a group but it depends on how many RAM you have. In my case, I couldn't replicate this but when I resampled I was able to do it.

from msva.

mpalaourg avatar mpalaourg commented on September 24, 2024

@StevRamos the pipeline for the gtscore calculation is a bit different for each dataset.

Summe, given you have downloaded the files from here, you shall have access to a folder named GT with some *.mat files. Each of this file has a user_score matrix with shape (frames, annotators) and a gtscore vector with shape (frames, 1). Somewhere in their paper they say that they give the videos in random order (to avoid users selecting only frames in the beginning of the video), so the different numbers in user_score is just the order of the selected frames/shots and not importance. In MATLAB code, for a video you would want something like this:

  idxs = find(user_score > 0);
  my_user_score = user_score;
  my_user_score(idxs) = 1;
  my_gtscore = mean(my_user_score, 2);
  all(my_gtscore == gt_score)

Then, to get the right shape you have to sub-sampled to 2fps each video, where they wrongfully assumed each video is 30fps.

TVSum, given you have downloaded the files from here, you shall have access to a folder named ../ydata-tvsum50-data/data with an anno.tsv file. Here, the 3rd columns is indeed importance scores with a max value of 5. Normalize this 3rd column for each video, and again take the average.

Our code isn't public yet we are waiting for a double-blind review for our work, and then we will released it. Although, to be honest the data preparation code won't be released, because we also used the same data as this repo (different splits). The only reason I was playing with the data, was to fully understand them and used them correctly!

PS. If I didn't understand your question, feel free to ask again 😅

from msva.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.