Giter Club home page Giter Club logo

near-duplicate-video-detection's Introduction

Near-Duplicate-Video-Detection (NDVD)

With the explosion of social networks, video content has risen exponentially over the last few years. Currently, Youtube reports 500h of video content being uploaded every minute. A Cisco forecast estimates that videos will constitute 80% of internet traffic by 2019. According to a recent study, 31.7% of videos on Youtube are duplicates with duplicates occupying 24% of storage space. It has become essential that we build robust systems to detect and purge these duplicates.

Near duplicate videos are a bigger class of problems which cover duplicate videos. Near duplicates videos are videos which have same visual content but differ in format ( scale, transformation, encoding etc ) or have small content modifications ( color / lightning / small text superimposition etc).

Problem Statement

Given a corpus of videos, identify if a query video is a near duplicate of an existing video in corpus.

Dataset

The standard for NDVD tasks is the CC_WEB_VIDEO dataset. The dataset contains a total of 13,129 videos and 24 query videos. The dataset already provides 398,008 keyframes extracted from the videos by shot boundary detection method.

Approach

The problem is tackled by using a bag-of visual words model for each video. This is done in following phases

  • Feature Extraction:

    Feature Extraction

Convolution Neural Networks are used to extract features from the video keyframes. Pretrained CNN networks have been proven to work well on many vision tasks such as Classification, Segmentation etc. Here, pretrained weights of AlexNet is used to extract feature vectors. Each video keyframe is forward passed through the intermediate layers of AlexNet to get frame feature vector. Max-pooling is applied on the intermediate feature maps to extract one single value. Each frame is then represented by a feature vector of 1376 dimensions. The video-level feature vector is calculated by concatenating over the individual keyframe feature vectors.

  • Visual Codebook Generation:

    Kmeans Clustering

A visual codebook is generated using the above feature vectors. Online Mini-batch K means is used to generate the codebook clusters. A sample of random 100K frames are used for visual codebook generation. K = 1000 gives best results from experiments.

  • Video level histogram:

    Video Histogram

For each keyframe in a video, the Nearest cluster is identified to generate a keyframe level histogram. Correspondingly, a video level histogram is generated by summing oveer the individual keyframe histograms.

  • Similarity:

    Cosine Similarity

video similarity is inferred by calculating cosine similarity using tf-idf (term frequency / inverse-document frequency) between the two video histograms. To reduce the number of comparisons, we construct an inverted index to identify the videos which have a common visual word.

Results

Using the above approach for the 24 query videos, the mAP (mean average precision) score was 0.951. The state of art techniques achieve a mAP of 0.98. However, note that the above metric is not very well deined compared to Classification / Segmentation tasks. Defining visually similar videos & ranking them is subject to human decision making and not well defined.

Future Work

  1. Scalability: How can we extend this to handle videos / keyframes of 1M+ videos? K-means clustering suffers from curse of dimensionality.
  2. Robustness: Run detailed analysis of type of near-duplicate videos (scale changes, small frame changes etc) to report its robustness against diffetent types.
  3. Metric learning: Use metric learning to calculate the similarity metric between the near-duplicate videos.
  4. Near Duplicate Video Retrieval: Retreive a ranked list of near-duplicate videos in order of similarity. This can be extende to retrieve visually similar videos.

References:

  1. Near-duplicate Video Detection using CNN intermediate features
  2. NDVD using metric learning
  3. TF-IDF tutorial

near-duplicate-video-detection's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

near-duplicate-video-detection's Issues

is there any paper corresponding to the code?

hello,
I wonder if there are any paper corresponding to this code or not ?
and I want to know why you generate codebook from 100K sample frames.
to be specific ,why the sample number is 100K ?
thank you

万分感谢!

有人知道可以用自己的这个代码可以用于自己的数据集吗??

Method for calculating mAP

I was examining your code but I couldn't find where you calculate mean average precision. You state you are achieving 0.951 mAP score but in your notebook I couldn't locate it.

Could you provide any source of how do you calculate your mAP?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.