[CVPR 2022] Exploring Dual-task Correlation for Pose Guided Person Image Generation

License: Other

Python 99.45% Shell 0.55%

pose-guided-person-image-generation image-generation person-image-generation pose-transfer cvpr2022 dual-task-correlation dual-task-pose-transformer-network gan deepfashion market1501

dual-task-pose-transformer-network's Introduction

Dual-task Pose Transformer Network

The source code for our paper "Exploring Dual-task Correlation for Pose Guided Person Image Generation“, Pengze Zhang, Lingxiao Yang, Jianhuang Lai, and Xiaohua Xie, CVPR 2022. Video: [Chinese] [English]

Abstract

Pose Guided Person Image Generation (PGPIG) is the task of transforming a person image from the source pose to a given target pose. Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. The DPTN is of a Siamese structure, containing a source-to-source self-reconstruction branch, and a transformation branch for source-to-target generation. By sharing partial weights between them, the knowledge learned by the source-to-source task can effectively assist the source-to-target learning. Furthermore, we bridge the two branches with a proposed Pose Transformer Module (PTM) to adaptively explore the correlation between features from dual tasks. Such correlation can establish the fine-grained mapping of all the pixels between the sources and the targets, and promote the source texture transmission to enhance the details of the generated target images. Extensive experiments show that our DPTN outperforms state-of-the-arts in terms of both PSNR and LPIPS. In addition, our DPTN only contains 9.79 million parameters, which is significantly smaller than other approaches.

Get Start

1) Requirement

Python 3.7.9
Pytorch 1.7.1
torchvision 0.8.2
CUDA 11.1
NVIDIA A100 40GB PCIe

2) Data Preperation

Following PATN, the dataset split files and extracted keypoints files can be obtained as follows:

DeepFashion

Download the DeepFashion dataset in-shop clothes retrival benchmark, and put them under the ./dataset/fashion directory.
Download train/test pairs and train/test keypoints annotations from Google Drive, including fasion-resize-pairs-train.csv, fasion-resize-pairs-test.csv, fasion-resize-annotation-train.csv, fasion-resize-annotation-train.csv, train.lst, test.lst, and put them under the ./dataset/fashion directory.
Split the raw image into the training set (./dataset/fashion/train) and test set (./dataset/fashion/test):

python data/generate_fashion_datasets.py

Market1501

Download the Market1501 dataset from here. Rename bounding_box_train and bounding_box_test as train and test, and put them under the ./dataset/market directory.
Download train/test key points annotations from Google Drive including market-pairs-train.csv, market-pairs-test.csv, market-annotation-train.csv, market-annotation-train.csv. Put these files under the ./dataset/market directory.

3) Train a model

DeepFashion

python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0

Market1501

python train.py --name=DPTN_market --model=DPTN --dataset_mode=market --dataroot=./dataset/market --dis_layers=3 --lambda_g=5 --lambda_rec 2 --t_s_ratio=0.8 --save_latest_freq=10400 --batchSize 32 --gpu_id=0

4) Test the model

You can directly download our test results from Google Drive: Deepfashion, Market1501.

DeepFashion

python test.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --which_epoch latest --results_dir ./results/DPTN_fashion --batchSize 1 --gpu_id=0

Market1501

python test.py --name=DPTN_market --model=DPTN --dataset_mode=market --dataroot=./dataset/market --which_epoch latest --results_dir=./results/DPTN_market  --batchSize 1 --gpu_id=0

5) Evaluation

We adopt SSIM, PSNR, FID, LPIPS and person re-identification (re-id) system for the evaluation. Please clone the official repository PerceptualSimilarity of the LPIPS score, and put the folder PerceptualSimilarity to the folder metrics.

For SSIM, PSNR, FID and LPIPS:

DeepFashion

python -m  metrics.metrics --gt_path=./dataset/fashion/test --distorated_path=./results/DPTN_fashion --fid_real_path=./dataset/fashion/train --name=./fashion

Market1501

python -m  metrics.metrics --gt_path=./dataset/market/test --distorated_path=./results/DPTN_market --fid_real_path=./dataset/market/train --name=./market --market

For person re-id system:

Clone the code of the fast-reid to this project (./fast-reid-master). Move the config and loader of the DeepFashion dataset to (./fast-reid-master/configs/Fashion/bagtricks_R50.yml) and (./fast-reid-master/fastreid/data/datasets/fashion.py) respectively. Download the pre-trained network and put it under the ./fast-reid-master/logs/Fashion/bagtricks_R50-ibn/ directory. And then launch:

python ./tools/train_net.py --config-file ./configs/Fashion/bagtricks_R50.yml --eval-only MODEL.WEIGHTS ./logs/Fashion/bagtricks_R50-ibn/model_final.pth MODEL.DEVICE "cuda:0"

6) Pre-trained Model

Our pre-trained models and logs can be downloaded from Google Drive: Deepfashion[log], Market1501[log].

Citation

@InProceedings{Zhang_2022_CVPR,
    author    = {Zhang, Pengze and Yang, Lingxiao and Lai, Jian-Huang and Xie, Xiaohua},
    title     = {Exploring Dual-Task Correlation for Pose Guided Person Image Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {7713-7722}
}

Acknowledgement

We build our project based on pix2pix. Some dataset preprocessing methods are derived from PATN.

dual-task-pose-transformer-network's People

Contributors

Stargazers

Watchers

Forkers

meimeiainaonao jnash123 githubcrj cv-synthesis wangyq9 acqxi fanghaipeng teacher-wang123 tsari02 gksruf2

dual-task-pose-transformer-network's Issues

About the training epochs?

How many epochs should I train to get the results you shown in the paper. It seems need 140 epochs, which will cost 1 week. I use one Tesla V100-32g in training time, and when using multiple gpus, there will be serious load imbalance.

testing on Market

I'm testing the pretrained model on the market dataset (by following the steps), but am getting gray images.
Please can you advice ?

evaluation dataset preparation

Hi, can you check if the data set I prepared is correct?
I cannot understand GFLA's evaluation setting.

I did,
[gt_path] dataset : (750 x 1101) -> (176, 256). *using PIL.Image.resize((176, 256))
[fid_real_path] dataset : (750 x 1101) -> (176, 256). *using PIL.Image.resize((176, 256))

Every scores are lower than on paper.
Do I have to resize images by usingcv2.resize method?

Fashion result looks terrible

I downloaded the model and run inference. By the way, the image pre-processing file did not work straight away, had to make some changes to get the path name correct. Anyway, the image results I got look aweful, not just a few, but all of them. Have I done anything wrong?

I unzip epoch190.zip into checkpoints/DPTN_fashion and run
python test.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --which_epoch latest --results_dir ./results/DPTN_fashion --batchSize 1 --gpu_id=0

New image test

Hello, I am wondering how to test on unseen image. I followed the procedure and I found I need to generate keypoints for my images. I found the PATN project >> tool/compute_coordiantes.py. But when I run this file on my image, there is an error says the size mismatch which caused at line 216. (output1, output2 = model.predict(imageToTest_padded)). It looks like because of the multiplier, the input sizes are different, so there always an error. I am wondering how to solve this problem?

Thank you!

FID calculation: what is the number of generated images used to calculate FID?

Can you confirm if this work used the standard practice to calculate the FID: 50K generated images as fake and whole training set as real?

why the fid calculated with your evaluation script is different from the fid on your paper?

I just use the generated results that you provide and the metrics calculation scripts. But the fid is not same with the one on your paper.

Why is the loss of my training on the DeepFashion dataset rising

I use DeepFashion Dataset，then run python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0 At first the loss was falling and then rising again
This is my train_opt.txt:
affine: True
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
data_type: 32
dataroot: ./dataset/fashion
dataset_mode: fashion
debug: False
device: cuda
dis_layers: 4
display_env: DPTNfashion
display_freq: 200
display_id: 0
display_port: 8096
display_single_pane_ncols: 0
display_winsize: 512
feat_num: 3
fineSize: 512
fp16: False
gan_mode: lsgan
gpu_ids: [0]
image_nc: 3
init_type: orthogonal
input_nc: 3
instance_feat: False
isTrain: True
iter_start: 0
label_feat: False
label_nc: 35
lambda_content: 0.25
lambda_feat: 10.0
lambda_g: 2.0
lambda_rec: 2.5
lambda_style: 250
layers_g: 3
loadSize: 256
load_features: False
load_pretrain:
load_size: 256
local_rank: 0
lr: 0.0002
lr_policy: lambda
max_dataset_size: inf
model: DPTN
nThreads: 2
n_clusters: 10
n_downsample_E: 4
n_layers_D: 3
name: DPTN_fashion
ndf: 64
nef: 16
nhead: 2
niter: 100
niter_decay: 100
no_flip: False
no_ganFeat_loss: False
no_html: False
no_instance: False
no_vgg_loss: False
norm: instance
num_CABs: 2
num_D: 1
num_TTBs: 2
num_blocks: 3
old_size: (256, 176)
output_nc: 3
phase: train
pool_size: 0
pose_nc: 18
print_freq: 200
ratio_g2d: 0.1
resize_or_crop: scale_width
save_epoch_freq: 1
save_input: False
save_latest_freq: 1000
serial_batches: False
structure_nc: 18
t_s_ratio: 0.5
tf_log: False
use_coord: False
use_dropout: False
use_spect_d: True
use_spect_g: False
verbose: False
which_epoch: latest

Do I need to continue training or do I stop to change the parameters.
Thank You！

How to visualize the attention weight in MHCA

Dear @PangzeCheung

Thanks for your work, I would like to know how to visualize the attention weights like Figure 7 in your paper.

Training and testing on custom dataset

Hello,
I am interested in learning about pose translation from one image to another ,and I came across your repository. I would like to know

In readme it's mentioned

Download the DeepFashion dataset [in-shop clothes retrival benchmark](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html), and put them under the ./dataset/fashion directory.

Download train/test pairs and train/test keypoints annotations from [Google Drive] including fasion-resize-pairs-train.csv, fasion-resize-pairs-test.csv, fasion-resize-annotation-train.csv, fasion-resize-annotation-train.csv, train.lst, test.lst, and put them under the ./dataset/fashion directory.

Split the raw image into the training set (./dataset/fashion/train) and test set (./dataset/fashion/test)

As such ,in order to try the repository on custom dataset are these the steps which should be taken.

Split into train test ,crop and Then execute

python tool/generate_fashion_datasets.py

Use openpose to obtain keypoints.apart from openpose can we use any other keypoint estimation library. Is it mandatory to use openpose to obtain keypoints.Can any other pose estimation framework like mediapipe be used
,then create pairs.csv using

python2 tool/create_pairs_dataset.py

Apart from these steps,am I missing anything

training step problem

Your work inspires me a lot！
It is mentioned in the text that train the source-to-source network as an auxiliary, and then share the weights to the source-to-target, I would like to know their training steps?
First ignore the Pose Transformer Module, train the source-to-source network, then share parameters to the source-to-target network, and finally train the Pose Transformer Module.
Is that what I think? I want to know more details！
Thank u a lot!

About image evaluation indicators

Dear @PangzeCheung
Your work is impressive and has provided me with great inspiration. However, while reading papers on similar topics, I have noticed that different papers cite different numerical values for image evaluation indicators, such as PSNR, even when referring to the same paper. I am curious to know how these values are derived and whether the calculation procedures used are the same. Thank you.

Test results on Deepfashion

Hi, thanks for your enlightened work. Do the released "epoch-190" images correspond to the results in the paper?
I calculate the metrics on the images and it seems different from those in the paper. Is the metrics figure related to the version of software and GPU?

请问，fashion和market在服务器上train和test时，分别跑了多长时间？

Question about the training in terms of the Epochs

Hey @PangzeCheung ,

Again me, hope everything goes well with you.

I noticed the niter and niter_decay are all set to 100. That's to say we need to train the entire model totally 200 epochs for each dataset?
Since I also find that the pre-trained checkpoints you provided for DeepFashion is at 190 epoch, while for market-1501, it is just iter 811200. So I am a bit confused on how should I choose a checkpoint between checkpoint stored by epoch or stored by total iters?
I alreadt succefully trained the Dual-Task PTN with 1 Nvidia TITAN xp gpu on Market-1501 dataset, it cost me around 4 days by now (Stil at Epoch 131). But there all lots of checkpoints named in two different ways as follows. I also want to ask is this all right?

Looking forward to you reply but not in a hurry~

Best Regards,

Reproduction of the results in the paper

First of all, thanks for making the code public.
I have prepared the DeepFashion dataset following the instructions in the README, downloaded the trained model, and tried the test code, but I did not get the results as published in the paper. is the README correct? Also, from the script for preparing the dataset, it seems that it is trained on low resolution images, will the code and trained models for the high resolution DeepFashion dataset be published?

requirement for the file named 'base_options'

Hey @PangzeCheung ,
Thanks for your impressive work, It's really interesting, I want to reproduce the results. However, when I want to re-train the model by meself, I find there should one files named base_option.py missed. Could you kindly add the file?

Many Thanks

pangzecheung / dual-task-pose-transformer-network Goto Github PK

dual-task-pose-transformer-network's Introduction

Dual-task Pose Transformer Network

Abstract

Get Start

1) Requirement

2) Data Preperation

3) Train a model

4) Test the model

5) Evaluation

6) Pre-trained Model

Citation

Acknowledgement

dual-task-pose-transformer-network's People

Contributors

Stargazers

Watchers

Forkers

dual-task-pose-transformer-network's Issues

Recommend Projects

Recommend Topics

Recommend Org