Giter Club home page Giter Club logo

ttida's Introduction

TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models

License: MIT arXiv

  • Abstract:
Data augmentation has been established as an efficacious approach to supplement useful information 
for low-resource datasets. Traditional augmentation techniques such as noise injection and image 
transformations have been widely used. In addition, generative data augmentation (GDA) has been shown 
to produce more diverse and flexible data. While generative adversarial networks (GANs) have been 
frequently used for GDA, they lack diversity and controllability compared to text-to-image diffusion 
models. In this paper, we propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the 
capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image (T2I) generative models 
for data augmentation. By conditioning the T2I model on detailed descriptions produced by T2T models, 
we are able to generate photo-realistic labeled images in a flexible and controllable manner. 
Experiments on in-domain classification, cross-domain classification, and image captioning tasks show 
consistent improvements over other data augmentation baselines. Analytical studies in varied settings, 
including few-shot, long-tail, and adversarial, further reinforce the effectiveness of TTIDA in 
enhancing performance and increasing robustness.

Pipeline

picture

In the overview Figure, arrows in different colors denote different pipeline steps. For each object category, i.e., bike, we input the label text "bike" to the T2I model such as GLIDE to generate multiple photo-realistic images of this object (Step 3). Then we combine the real images from the original dataset with the generated synthetic images together (Step 4). The augmented dataset is directly used for model training. Usually, the label text is a word or short phrase. To automatically obtain a finer prompt for the T2I model, we can first input the label text to a text-to-text (T2T) generative model finetuned with image captions (Step 1) to produce a longer object description (Step 2), e.g., "a white bike near the wall". Step 1 and Step 2 are optional since the T2I model can still generate high-quality images with the label text input. Yet the T2T model can produce precise or personalized object descriptions with a richer context, increasing the diversity of synthetic images to a large extent.

Environment

conda create -n ttida python=3.9
conda activate ttida
pip install -r requirements.txt

Datasets

Dataset (Domain) #img total #classes #img per class
CIFAR-100 50000 100 500
Office-31 (Amazon) 2817 31 91
Office-31 (DSLR) 498 31 16
Office-31 (Webcam) 795 31 26
Office-Home (Art) 2427 65 37
Office-Home (Clipart) 4365 65 67
Office-Home (Product) 4439 65 68
Office-Home (Real-World) 4357 65 67

Experiments

Backbone Models for Classification/Generation

Generative Models for Data Augmentation

Run Tasks

  • In-domain Image classification (ResNet-101 on CIFAR-100 of different settings)
cd img_clf
bash run_train_cifar100.sh
bash run_train_cifar100_adv.sh
bash run_train_cifar100_gan.sh
bash run_train_cifar100_lt.sh
bash run_train_cifar100_trans.sh
  • Cross-domain Image classification (CDTrans on Office-31 and Office-Home)
cd cdtrans
bash run_train_office_31.sh
bash run_train_office_home.sh
  • Image Captioning (mPLUG on COCO 2015 Image Captioning Task)
cd mplug
bash run_train_coco.sh

License

Please refer to the LICENSE file for more details.

Citation

@article{yin2023ttida,
  title   = {TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models},
  author  = {Yin, Yuwei and Kaddour, Jean and Zhang, Xiang and Nie, Yixin and Liu, Zhenguang and Kong, Lingpeng and Liu, Qi},
  journal = {arXiv preprint arXiv:2304.08821},
  year    = {2023},
  url     = {https://arxiv.org/abs/2304.08821},
}

ttida's People

Contributors

yuweiyin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.