Giter Club home page Giter Club logo

mat's Introduction

MAT: Mask-Aware Transformer for Large Hole Image Inpainting (CVPR 2022 Best Paper Finalist, Oral)

PWC PWC

Wenbo Li, Zhe Lin, Kun Zhou, Lu Qi, Yi Wang, Jiaya Jia


🚀 🚀 🚀 News

  • [2022.10.03] Model for FFHQ-512 is available. (Link)

  • [2022.09.10] We could provide all testing images of Places and CelebA inpainted by our MAT and other methods. Since there are too many images, please send an email to [email protected] and explain your needs.

  • [2022.06.21] We provide a SOTA Places-512 model (Places_512_FullData.pkl) trained with full Places data (8M images). It achieves significant improvements on all metrics.

    Model Data Small Mask Large Mask
    FID↓ P-IDS↑ U-IDS↑ FID↓ P-IDS↑ U-IDS↑
    MAT (Ours) 8M 0.78 31.72 43.71 1.96 23.42 38.34
    MAT (Ours) 1.8M 1.07 27.42 41.93 2.90 19.03 35.36
    CoModGAN 8M 1.10 26.95 41.88 2.92 19.64 35.78
    LaMa-Big 4.5M 0.99 22.79 40.58 2.97 13.09 32.39
  • [2022.06.19] We have uploaded the CelebA-HQ-256 model and masks. Because the original model was lost, we retrained the model so that the results may slightly differ from the reported ones.


Web Demo

Thank Replicate for providing a web demo for our MAT. But I didn't check if this demo is correct. You are recommended to use our models as following.


Visualization

We present a transformer-based model (MAT) for large hole inpainting with high fidelity and diversity.

large hole inpainting with pluralistic generation

Compared to other methods, the proposed MAT restores more photo-realistic images with fewer artifacts.

comparison with sotas

Usage

It is highly recommanded to adopt Conda/MiniConda to manage the environment to avoid some compilation errors.

  1. Clone the repository.
    git clone https://github.com/fenglinglwb/MAT.git 
  2. Install the dependencies.
    • Python 3.7
    • PyTorch 1.7.1
    • Cuda 11.0
    • Other packages
    pip install -r requirements.txt

Quick Test

  1. We provide models trained on CelebA-HQ, FFHQ and Places365-Standard at 512x512 resolution. Download models from One Drive and put them into the 'pretrained' directory. The released models are retrained, and hence the visualization results may slightly differ from the paper.

  2. Obtain inpainted results by running

    python generate_image.py --network model_path --dpath data_path --outdir out_path [--mpath mask_path]

    where the mask path is optional. If not assigned, random 512x512 masks will be generated. Note that 0 and 1 values in a mask refer to masked and remained pixels.

    For example, run

    python generate_image.py --network pretrained/CelebA-HQ.pkl --dpath test_sets/CelebA-HQ/images --mpath test_sets/CelebA-HQ/masks --outdir samples

    Note.

    • Our implementation only supports generating an image whose size is a multiple of 512. You need to pad or resize the image to make its size a multiple of 512. Please pad the mask with 0 values.
    • If you want to use the CelebA-HQ-256 model, please specify the parameter 'resolution' as 256 in generate_image.py.

Train

For example, if you want to train a model on Places, run a bash script with

python train.py \
    --outdir=output_path \
    --gpus=8 \
    --batch=32 \
    --metrics=fid36k5_full \
    --data=training_data_path \
    --data_val=val_data_path \
    --dataloader=datasets.dataset_512.ImageFolderMaskDataset \
    --mirror=True \
    --cond=False \
    --cfg=places512 \
    --aug=noaug \
    --generator=networks.mat.Generator \
    --discriminator=networks.mat.Discriminator \
    --loss=losses.loss.TwoStageLoss \
    --pr=0.1 \
    --pl=False \
    --truncation=0.5 \
    --style_mix=0.5 \
    --ema=10 \
    --lr=0.001

Description of arguments:

  • outdir: output path for saving logs and models
  • gpus: number of used gpus
  • batch: number of images in all gpus
  • metrics: find more metrics in 'metrics/metric_main.py'
  • data: training data
  • data_val: validation data
  • dataloader: you can define your own dataloader
  • mirror: use flip augmentation or not
  • cond: use class info, default: false
  • cfg: configuration, find more details in 'train.py'
  • aug: use augmentation of style-gan-ada or not, default: false
  • generator: you can define your own generator
  • discriminator: you can define your own discriminator
  • loss: you can define your own loss
  • pr: ratio of perceptual loss
  • pl: use path length regularization or not, default: false
  • truncation: truncation ratio proposed in stylegan
  • style_mix: style mixing ratio proposed in stylegan
  • ema: exponoential moving averate, ~K samples
  • lr: learning rate

Evaluation

We provide evaluation scrtips for FID/U-IDS/P-IDS/LPIPS/PSNR/SSIM/L1 metrics in the 'evaluation' directory. Only need to give paths of your results and GTs.

We also provide our masks for CelebA-HQ-val and Places-val here.

Citation

@inproceedings{li2022mat,
    title={MAT: Mask-Aware Transformer for Large Hole Image Inpainting},
    author={Li, Wenbo and Lin, Zhe and Zhou, Kun and Qi, Lu and Wang, Yi and Jia, Jiaya},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2022}
}

License and Acknowledgement

The code and models in this repo are for research purposes only. Our code is bulit upon StyleGAN2-ADA.

mat's People

Contributors

fenglinglwb avatar chongjiange avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.