Giter Club home page Giter Club logo

dual-task-pose-transformer-network's Introduction

Dual-task Pose Transformer Network

The source code for our paper "Exploring Dual-task Correlation for Pose Guided Person Image Generation“, Pengze Zhang, Lingxiao Yang, Jianhuang Lai, and Xiaohua Xie, CVPR 2022. Video: [Chinese] [English] framework

Abstract

Pose Guided Person Image Generation (PGPIG) is the task of transforming a person image from the source pose to a given target pose. Most of the existing methods only focus on the ill-posed source-to-target task and fail to capture reasonable texture mapping. To address this problem, we propose a novel Dual-task Pose Transformer Network (DPTN), which introduces an auxiliary task (i.e., source-tosource task) and exploits the dual-task correlation to promote the performance of PGPIG. The DPTN is of a Siamese structure, containing a source-to-source self-reconstruction branch, and a transformation branch for source-to-target generation. By sharing partial weights between them, the knowledge learned by the source-to-source task can effectively assist the source-to-target learning. Furthermore, we bridge the two branches with a proposed Pose Transformer Module (PTM) to adaptively explore the correlation between features from dual tasks. Such correlation can establish the fine-grained mapping of all the pixels between the sources and the targets, and promote the source texture transmission to enhance the details of the generated target images. Extensive experiments show that our DPTN outperforms state-of-the-arts in terms of both PSNR and LPIPS. In addition, our DPTN only contains 9.79 million parameters, which is significantly smaller than other approaches.

Get Start

1) Requirement

  • Python 3.7.9
  • Pytorch 1.7.1
  • torchvision 0.8.2
  • CUDA 11.1
  • NVIDIA A100 40GB PCIe

2) Data Preperation

Following PATN, the dataset split files and extracted keypoints files can be obtained as follows:

DeepFashion

  • Download the DeepFashion dataset in-shop clothes retrival benchmark, and put them under the ./dataset/fashion directory.

  • Download train/test pairs and train/test keypoints annotations from Google Drive, including fasion-resize-pairs-train.csv, fasion-resize-pairs-test.csv, fasion-resize-annotation-train.csv, fasion-resize-annotation-train.csv, train.lst, test.lst, and put them under the ./dataset/fashion directory.

  • Split the raw image into the training set (./dataset/fashion/train) and test set (./dataset/fashion/test):

python data/generate_fashion_datasets.py

Market1501

  • Download the Market1501 dataset from here. Rename bounding_box_train and bounding_box_test as train and test, and put them under the ./dataset/market directory.

  • Download train/test key points annotations from Google Drive including market-pairs-train.csv, market-pairs-test.csv, market-annotation-train.csv, market-annotation-train.csv. Put these files under the ./dataset/market directory.

3) Train a model

DeepFashion

python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0

Market1501

python train.py --name=DPTN_market --model=DPTN --dataset_mode=market --dataroot=./dataset/market --dis_layers=3 --lambda_g=5 --lambda_rec 2 --t_s_ratio=0.8 --save_latest_freq=10400 --batchSize 32 --gpu_id=0

4) Test the model

You can directly download our test results from Google Drive: Deepfashion, Market1501.

DeepFashion

python test.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --which_epoch latest --results_dir ./results/DPTN_fashion --batchSize 1 --gpu_id=0

Market1501

python test.py --name=DPTN_market --model=DPTN --dataset_mode=market --dataroot=./dataset/market --which_epoch latest --results_dir=./results/DPTN_market  --batchSize 1 --gpu_id=0

5) Evaluation

We adopt SSIM, PSNR, FID, LPIPS and person re-identification (re-id) system for the evaluation. Please clone the official repository PerceptualSimilarity of the LPIPS score, and put the folder PerceptualSimilarity to the folder metrics.

  • For SSIM, PSNR, FID and LPIPS:

DeepFashion

python -m  metrics.metrics --gt_path=./dataset/fashion/test --distorated_path=./results/DPTN_fashion --fid_real_path=./dataset/fashion/train --name=./fashion

Market1501

python -m  metrics.metrics --gt_path=./dataset/market/test --distorated_path=./results/DPTN_market --fid_real_path=./dataset/market/train --name=./market --market
  • For person re-id system:

Clone the code of the fast-reid to this project (./fast-reid-master). Move the config and loader of the DeepFashion dataset to (./fast-reid-master/configs/Fashion/bagtricks_R50.yml) and (./fast-reid-master/fastreid/data/datasets/fashion.py) respectively. Download the pre-trained network and put it under the ./fast-reid-master/logs/Fashion/bagtricks_R50-ibn/ directory. And then launch:

python ./tools/train_net.py --config-file ./configs/Fashion/bagtricks_R50.yml --eval-only MODEL.WEIGHTS ./logs/Fashion/bagtricks_R50-ibn/model_final.pth MODEL.DEVICE "cuda:0"

6) Pre-trained Model

Our pre-trained models and logs can be downloaded from Google Drive: Deepfashion[log], Market1501[log].

Citation

@InProceedings{Zhang_2022_CVPR,
    author    = {Zhang, Pengze and Yang, Lingxiao and Lai, Jian-Huang and Xie, Xiaohua},
    title     = {Exploring Dual-Task Correlation for Pose Guided Person Image Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {7713-7722}
}

Acknowledgement

We build our project based on pix2pix. Some dataset preprocessing methods are derived from PATN.

dual-task-pose-transformer-network's People

Contributors

pangzecheung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dual-task-pose-transformer-network's Issues

Test results on Deepfashion

Hi, thanks for your enlightened work. Do the released "epoch-190" images correspond to the results in the paper?
I calculate the metrics on the images and it seems different from those in the paper. Is the metrics figure related to the version of software and GPU?
ac5ee43b26ee5dd2cb3dcd940d0c209

training step problem

  • Your work inspires me a lot!
  • It is mentioned in the text that train the source-to-source network as an auxiliary, and then share the weights to the source-to-target, I would like to know their training steps?
  • First ignore the Pose Transformer Module, train the source-to-source network, then share parameters to the source-to-target network, and finally train the Pose Transformer Module.
  • Is that what I think? I want to know more details!
  • Thank u a lot!

About the training epochs?

How many epochs should I train to get the results you shown in the paper. It seems need 140 epochs, which will cost 1 week. I use one Tesla V100-32g in training time, and when using multiple gpus, there will be serious load imbalance.

About image evaluation indicators

Dear @PangzeCheung
Your work is impressive and has provided me with great inspiration. However, while reading papers on similar topics, I have noticed that different papers cite different numerical values for image evaluation indicators, such as PSNR, even when referring to the same paper. I am curious to know how these values are derived and whether the calculation procedures used are the same. Thank you.

Question about the training in terms of the Epochs

Hey @PangzeCheung ,

Again me, hope everything goes well with you.

  1. I noticed the niter and niter_decay are all set to 100. That's to say we need to train the entire model totally 200 epochs for each dataset?
  2. Since I also find that the pre-trained checkpoints you provided for DeepFashion is at 190 epoch, while for market-1501, it is just iter 811200. So I am a bit confused on how should I choose a checkpoint between checkpoint stored by epoch or stored by total iters?
  3. I alreadt succefully trained the Dual-Task PTN with 1 Nvidia TITAN xp gpu on Market-1501 dataset, it cost me around 4 days by now (Stil at Epoch 131). But there all lots of checkpoints named in two different ways as follows. I also want to ask is this all right?
    image

Looking forward to you reply but not in a hurry~

Best Regards,

Training and testing on custom dataset

Hello,
I am interested in learning about pose translation from one image to another ,and I came across your repository. I would like to know

In readme it's mentioned

Download the DeepFashion dataset [in-shop clothes retrival benchmark](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/InShopRetrieval.html), and put them under the ./dataset/fashion directory.

Download train/test pairs and train/test keypoints annotations from [Google Drive] including fasion-resize-pairs-train.csv, fasion-resize-pairs-test.csv, fasion-resize-annotation-train.csv, fasion-resize-annotation-train.csv, train.lst, test.lst, and put them under the ./dataset/fashion directory.

Split the raw image into the training set (./dataset/fashion/train) and test set (./dataset/fashion/test)

As such ,in order to try the repository on custom dataset are these the steps which should be taken.

  1. Split into train test ,crop and Then execute
python tool/generate_fashion_datasets.py
  1. Use openpose to obtain keypoints.apart from openpose can we use any other keypoint estimation library. Is it mandatory to use openpose to obtain keypoints.Can any other pose estimation framework like mediapipe be used
  2. ,then create pairs.csv using
python2 tool/create_pairs_dataset.py

Apart from these steps,am I missing anything

testing on Market

I'm testing the pretrained model on the market dataset (by following the steps), but am getting gray images.
Please can you advice ?
images

Why is the loss of my training on the DeepFashion dataset rising

I use DeepFashion Dataset,then run python train.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --batchSize 32 --gpu_id=0 At first the loss was falling and then rising again
This is my train_opt.txt:
affine: True
batchSize: 8
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: False
data_type: 32
dataroot: ./dataset/fashion
dataset_mode: fashion
debug: False
device: cuda
dis_layers: 4
display_env: DPTNfashion
display_freq: 200
display_id: 0
display_port: 8096
display_single_pane_ncols: 0
display_winsize: 512
feat_num: 3
fineSize: 512
fp16: False
gan_mode: lsgan
gpu_ids: [0]
image_nc: 3
init_type: orthogonal
input_nc: 3
instance_feat: False
isTrain: True
iter_start: 0
label_feat: False
label_nc: 35
lambda_content: 0.25
lambda_feat: 10.0
lambda_g: 2.0
lambda_rec: 2.5
lambda_style: 250
layers_g: 3
loadSize: 256
load_features: False
load_pretrain:
load_size: 256
local_rank: 0
lr: 0.0002
lr_policy: lambda
max_dataset_size: inf
model: DPTN
nThreads: 2
n_clusters: 10
n_downsample_E: 4
n_layers_D: 3
name: DPTN_fashion
ndf: 64
nef: 16
nhead: 2
niter: 100
niter_decay: 100
no_flip: False
no_ganFeat_loss: False
no_html: False
no_instance: False
no_vgg_loss: False
norm: instance
num_CABs: 2
num_D: 1
num_TTBs: 2
num_blocks: 3
old_size: (256, 176)
output_nc: 3
phase: train
pool_size: 0
pose_nc: 18
print_freq: 200
ratio_g2d: 0.1
resize_or_crop: scale_width
save_epoch_freq: 1
save_input: False
save_latest_freq: 1000
serial_batches: False
structure_nc: 18
t_s_ratio: 0.5
tf_log: False
use_coord: False
use_dropout: False
use_spect_d: True
use_spect_g: False
verbose: False
which_epoch: latest

image
Do I need to continue training or do I stop to change the parameters.
Thank You!

requirement for the file named 'base_options'

Hey @PangzeCheung ,
Thanks for your impressive work, It's really interesting, I want to reproduce the results. However, when I want to re-train the model by meself, I find there should one files named base_option.py missed. Could you kindly add the file?

Many Thanks

Reproduction of the results in the paper

First of all, thanks for making the code public.
I have prepared the DeepFashion dataset following the instructions in the README, downloaded the trained model, and tried the test code, but I did not get the results as published in the paper. is the README correct? Also, from the script for preparing the dataset, it seems that it is trained on low resolution images, will the code and trained models for the high resolution DeepFashion dataset be published?

Fashion result looks terrible

fashionWOMENBlouses_Shirtsid0000523301_1front_2_fashionWOMENBlouses_Shirtsid0000523301_4full_vis

I downloaded the model and run inference. By the way, the image pre-processing file did not work straight away, had to make some changes to get the path name correct. Anyway, the image results I got look aweful, not just a few, but all of them. Have I done anything wrong?

I unzip epoch190.zip into checkpoints/DPTN_fashion and run
python test.py --name=DPTN_fashion --model=DPTN --dataset_mode=fashion --dataroot=./dataset/fashion --which_epoch latest --results_dir ./results/DPTN_fashion --batchSize 1 --gpu_id=0

New image test

Hello, I am wondering how to test on unseen image. I followed the procedure and I found I need to generate keypoints for my images. I found the PATN project >> tool/compute_coordiantes.py. But when I run this file on my image, there is an error says the size mismatch which caused at line 216. (output1, output2 = model.predict(imageToTest_padded)). It looks like because of the multiplier, the input sizes are different, so there always an error. I am wondering how to solve this problem?

Thank you!

evaluation dataset preparation

Hi, can you check if the data set I prepared is correct?
I cannot understand GFLA's evaluation setting.

I did,
[gt_path] dataset : (750 x 1101) -> (176, 256). *using PIL.Image.resize((176, 256))
[fid_real_path] dataset : (750 x 1101) -> (176, 256). *using PIL.Image.resize((176, 256))

Every scores are lower than on paper.
Do I have to resize images by usingcv2.resize method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.