Giter Club home page Giter Club logo

text2image-benchmark's Introduction

This project aims to unify the evaluation of generative text-to-image models and provide the ability to quickly and easily calculate most popular metrics.

Goals of this benchmark:

  • Unified metrics and datasets for all text-to-image models
  • Reproducible results
  • User-friendly interface for most popular metrics: FID and CLIP-score

Table of Contents

Introduction

Generative text-to-image models have become a popular and widely used tool for users. There are many articles on the topic of image generation from text that present new, more advanced models. However, there is still no uniform way to measure the quality of such models. To address this issue, we provide an implementation of metrics and a dataset to compare the quality of generative models.

We propose to use the metric MS-COCO FID-30K with OpenAI's CLIP score, which has already become a standard for measuring the quality of text2image models. We provide the MS-COCO validation subset and precalculated metrics for it. We also recorded 30,000 descriptions that needs to be used to generate images for MS-COCO FID-30K.

You can easily contribute your model into benchmark and make FID results reproducible! See more in contribution section.

Main features

  • Standardized FID calculation: fixed image preprocessing and InceptionV3 model.
  • FID-30k on MS-COCO validation set: we provide dataset on huggingface🤗, precomputed FID stats, fixed 30000 captions from MS-COCO that should be used to generate images
  • Implementations of different popular text-to-image models to make metrics reproducible
  • CLIP-score calculation
  • User-friendly metrics calculation (checkout Getting started)

Installation

pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/boomb0om/text2image-benchmark

Getting started

Metrics: FID

Calculate FID for two sets of images:

from T2IBenchmark import calculate_fid

fid, _ = calculate_fid('assets/images/cats/', 'assets/images/dogs/')
print(fid)

Calculate FID between model generations and MS-COCO validation subset:

from T2IBenchmark import calculate_fid
from T2IBenchmark.datasets import get_coco_fid_stats

fid, _ = calculate_fid(
    'path/to/your/generations/',
    get_coco_fid_stats()
)

MS-COCO FID-30k for T2IModelWrapper. In this example we are using Kandinsky 2.1 model:

pip install -r T2IBenchmark/models/kandinsky21/requirements.txt
from T2IBenchmark import calculate_coco_fid
from T2IBenchmark.models.kandinsky21 import Kandinsky21Wrapper

fid, fid_data = calculate_coco_fid(
    Kandinsky21Wrapper,
    device='cuda:0',
    save_generations_dir='coco_generations/'
)

Metrics: CLIP-score

Example of calculating CLIP-score for a set of images and fixed prompt:

from T2IBenchmark import calculate_clip_score
from glob import glob

cat_paths = glob('assets/images/cats/*.jpg')
captions_mapping = {path: "a cat" for path in cat_paths}
clip_score = calculate_clip_score(cat_paths, captions_mapping=captions_mapping)

Project Structure

  • T2IBenchmark/
    • datasets/ - Datasets that can be used for evaluation
      • coco2014/ - MS-COCO 2014 validation subset
    • feature_extractors/ - Implementation of different neural nets used to extract features from images
    • metrics/ - Implementation of metrics
    • utils/ - Some utils
  • tests/ - Tests
  • docs/ - Documentation
  • examples/ - Benchmark usage examples
  • experiments/ - Experiments with metrics
  • assets/ - Assets

Examples

Examples of use are listed below in recommended order for study:

Documentation

  • FID.md - Explanation of different parameters that affects FID calculation

Contribution

If you want to contribute your model into this benchmark and publish metrics, follow these steps:

  1. Create a fork of this repository
  2. Create a wrapper for your model that inherits T2IModelWrapper class
  3. Generate images and calculate metrics using calculate_coco_fid. For more information see this example
  4. Create a pull request with your model
  5. Congrats!

TO-DO

  • Implementation of Inception Score (IS) and Kernel Inception Distance (KID)
  • FID-CLIPscore metric and plots
  • Implementation and FIDs for Kandinsky 2.X models with the help of Sber AI
  • Implementation and FIDs for popular models from diffusers: Stable Diffusion, IF

Contacts

Authors:

If you have any question, please email [email protected].

Citing

If you use this repository in your research, consider citing it using the following Bibtex entry:

@misc{boomb0omT2IBenchmark,
  author={Pavlov, I. and Ivanov, A. and Stafievskiy, S.},
  title={{Text-to-Image Benchmark: A benchmark for generative models}},
  howpublished={\url{https://github.com/boomb0om/text2image-benchmark}},
  month={September},
  year={2023},
  note={Version 0.1.0},
}

Acknowledgments

Thanks to:

  • clean-fid - Explanation of influence of various parameters when calculating FID.
  • pytorch-fid - Port of the official implementation of Frechet Inception Distance to PyTorch.

text2image-benchmark's People

Contributors

boomb0om avatar stasstaf avatar usefultornado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.