Giter Club home page Giter Club logo

content-debiased-fvd's Introduction

Content-Debiased FVD for Evaluating Video Generation Models

FVD is observed to favor the quality of individual frames over realistic motions. We verify this with quantitative measurement. We show that the bias can be attributed to the features extracted from a supervised video classifier trained on the content-biased dataset and using features from large-scale unsupervised models can mitigate the bias. This repo contains code tookit for easily computing FVDs with different pre-trained models. Please refer to out project page or paper for more details about the analysis.

On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang
UMD, CMU, Adobe
CVPR 2024

Quickstart

We provide a simple interface to compute FVD scores between two sets of videos that can be adapted to different scenarios. You could install the library through pip:

pip install cd-fvd

You may choose to download some example UCF-101 videos to test the code via:

bash cdfvd/download_example_videos.sh

The following code snippet demonstrates how to compute FVD scores between a folder of videos and precomputed statistics.

from cdfvd import fvd
evaluator = fvd.cdfvd('videomae', ckpt_path=None)
evaluator.load_videos('ucf101', data_type='stats_pkl', resolution=128, sequence_length=16)
evaluator.compute_fake_stats(evaluator.load_videos('./example_videos/'))
score = evaluator.compute_fvd_from_stats()

Please refer to the documentation for more detailed instructions on the usage.

Note: By default n_fake=2048. If n_fake is greater than number of videos in path/to/fakevideos/ folder, then same videos will be resampled n_fake times. If this is not the desired effect, please use custom value of n_fake of set n_fake='full' to use all videos in path/to/fakevideos/ without repetition.

Precomputed Datasets

We provide precomputed statistics for the following datasets.

Dataset Video Length Resolution Reference Split # Reference Videos Model Skip Frame # Seed
UCF101 16, 128 128, 256 train+test 2048, full I3D, VideoMAE-v2-SSv2 1 0
Sky 16, 128 128, 256 train 2048, full I3D, VideoMAE-v2-SSv2 1 0
Taichi 16, 128 128, 256 train 2048, full I3D, VideoMAE-v2-SSv2 1 0
Kinetics 16 128, 256 train 2048, full I3D, VideoMAE-v2-SSv2 1 0
Kinetics 128 128, 256 train 2048 I3D, VideoMAE-v2-SSv2 1 0
FFS 16, 128 128, 256 train 2048, full I3D, VideoMAE-v2-SSv2 1 0

Citation

@inproceedings{ge2024content,
      title={On the Content Bias in Fréchet Video Distance},
      author={Ge, Songwei and Mahapatra, Aniruddha and Parmar, Gaurav and Zhu, Jun-Yan and Huang, Jia-Bin},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year={2024}
}

Acknowledgement

We thank Angjoo Kanazawa, Aleksander Holynski, Devi Parikh, and Yogesh Balaji for their early feedback and discussion. We thank Or Patashnik, Richard Zhang, and Hadi Alzayer for their helpful comments and paper proofreading. We thank Ivan Skorokhodov for his help with reproducing the StyleGAN-v ablation experiments. Part of the evaluation code is built on StyleGAN-v.

Licenses

All material in this repository is made available under the MIT License.

metric_utils.py is adapted from the stylegan-v metric_utils.py, which was built on top of StyleGAN2-ADA and restricted by the NVidia Source Code license .

VideoMAE-v2 checkpoint is publicly available. Please consider filling this questionaire to help improve the future works.

content-debiased-fvd's People

Contributors

anime26398 avatar songweige avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.