Giter Club home page Giter Club logo

Comments (13)

JialeTao avatar JialeTao commented on June 10, 2024

Hi @yanerzidefanbaba , using git lfs to clone the project may helps. The baidudisk link of the checkpoints is here:
https://pan.baidu.com/s/1Zlr309OcsDuz5FaULQJYWQ passwd:qp6v

from mtia.

yanerzidefanbaba avatar yanerzidefanbaba commented on June 10, 2024

thanks for your reply and your exciting work. But the model I trained performed much worse than the one you provided. I think this may be caused by the loss weights. In original vox.yaml, generator_gan,discriminator_gan,feature_matching,bg_fg_mask and fg_mask_concentration were all set to 0. Could you please provide the correct value of these parameters?

from mtia.

JialeTao avatar JialeTao commented on June 10, 2024

The provided config file is the correct setting. Could you provide some training details of your experiments, such as log of losses?

from mtia.

yanerzidefanbaba avatar yanerzidefanbaba commented on June 10, 2024

Thanks a ton. My perceptual loss is 115, equivariance_value loss is 0.2249 and equivariance_jacobian is 0.4228. Are these far from your results?

from mtia.

JialeTao avatar JialeTao commented on June 10, 2024

Yes, the result is abnormal. After training the last epoch on the voxceleb dataset, the perceptual loss is usually around 80 and the equivariance loss is around 0.1. Currentlty I'm not sure of the reason. What's your training enviroments and did you make some custom changes to the code?

from mtia.

yanerzidefanbaba avatar yanerzidefanbaba commented on June 10, 2024

I decide to train it again cause I am not very sure whether I have changed the training environments. But I remember I didn't change anything except my data are in .mp4 format. Does that affect the outcome of the results?

from mtia.

JialeTao avatar JialeTao commented on June 10, 2024

That's OK. No, it doesn't affect the result but the training speed because of the io limit of the mp4 format. When you change the way to read data, make sure that the image/video is normalized to [0,1].

from mtia.

yanerzidefanbaba avatar yanerzidefanbaba commented on June 10, 2024

Thanks for your tips, I will try it again

from mtia.

benmcmahan avatar benmcmahan commented on June 10, 2024

is it possible to put the checkpoints somewhere else, I'm having an impossible time downloading them through baidu (or git)

from mtia.

JialeTao avatar JialeTao commented on June 10, 2024

@benmcmahan Try with goole drive: https://drive.google.com/file/d/1Fv1ts026-6BwOUTexV1KxwW54pQiK1rY/view?usp=sharing

from mtia.

yanerzidefanbaba avatar yanerzidefanbaba commented on June 10, 2024

大佬,我还是复现不出来,perceptual还是在110左右开始收敛,equivariance_value在0.22,epoch在70左右,repeat num=2 我的配置文件如下
dataset_params:
root_dir: /run/media/root/2/vox
frame_shape: [256, 256, 3]
id_sampling: True
pairs_list:
augmentation_params:
flip_param:
horizontal_flip: True
time_flip: True
jitter_param:
brightness: 0.1
contrast: 0.1
saturation: 0.1
hue: 0.1

model_params:
use_bg_predictor: False
common_params:
num_kp: 10
num_channels: 3
estimate_jacobian: True
generator_params:
block_expansion: 64
max_features: 512
num_down_blocks: 2
num_bottleneck_blocks: 6
estimate_occlusion_map: True
skips: True
dense_motion_params:
block_expansion: 64
max_features: 1024
num_blocks: 5
scale_factor: 0.25
discriminator_params:
scales: [1]
block_expansion: 32
max_features: 512
num_blocks: 4
sn: True

train_params:
num_epochs: 200
num_repeats: 1
epoch_milestones: [60, 90]
lr_generator: 2.0e-4
lr_discriminator: 2.0e-4
lr_kp_detector: 2.0e-4
lr_bg_predictor: 2.0e-4
batch_size: 10
scales: [1, 0.5, 0.25, 0.125]
clip_generator_grad: False
clip_kp_detector_grad: True
clip: 1
checkpoint_freq: 5
transform_params:
sigma_affine: 0.05
sigma_tps: 0.005
points_tps: 5
loss_weights:
generator_gan: 0
discriminator_gan: 0
feature_matching: [0, 0, 0, 0]
perceptual: [10, 10, 10, 10, 10]
equivariance_value: 10
equivariance_jacobian: 10
bg_fg_mask: 0
fg_mask_concentration: 0

reconstruction_params:
num_videos: 1000
format: '.mp4'

animate_params:
num_pairs: 50
format: '.mp4'
normalization_params:
adapt_movement_scale: False
use_relative_movement: True
use_relative_jacobian: True

visualizer_params:
kp_size: 5
draw_border: True
colormap: 'gist_rainbow'

MODEL:

default

TAG_PER_JOINT: True
HIDDEN_HEATMAP_DIM: -1
MULTI_TRANSFORMER_DEPTH: [12, 12]
MULTI_TRANSFORMER_HEADS: [16, 16]
MULTI_DIM: [48, 48]
NUM_BRANCHES: 1
BASE_CHANNEL: 32

default

ESTIMATE_JACOBIAN: True
TEMPERATURE: 0.1
DATA_PREPROCESS: False
FIX_IMG2MOTION_ATTENTION: False

INIT_WEIGHTS: False
NAME: pose_tokenpose_b
NUM_JOINTS: 10
PRETRAINED: ''
TARGET_TYPE: gaussian
TRANSFORMER_DEPTH: 12
TRANSFORMER_HEADS: 8
TRANSFORMER_MLP_RATIO: 3
POS_EMBEDDING_TYPE: 'sine-full'
INIT: true
DIM: 192 # 443
PATCH_SIZE:

  • 4
  • 4
    IMAGE_SIZE:
  • 256
  • 256
    HEATMAP_SIZE:
  • 64
  • 64
    SIGMA: 2
    EXTRA:
    PRETRAINED_LAYERS:
    • 'conv1'
    • 'bn1'
    • 'conv2'
    • 'bn2'
    • 'layer1'
    • 'transition1'
    • 'stage2'
    • 'transition2'
    • 'stage3'
      FINAL_CONV_KERNEL: 1
      STAGE2:
      NUM_MODULES: 1
      NUM_BRANCHES: 2
      BLOCK: BASIC
      NUM_BLOCKS:
      • 4
      • 4
        NUM_CHANNELS:
      • 32
      • 64
        FUSE_METHOD: SUM
        STAGE3:
        NUM_MODULES: 4
        NUM_BRANCHES: 3
        BLOCK: BASIC
        NUM_BLOCKS:
      • 4
      • 4
      • 4
        NUM_CHANNELS:
      • 32
      • 64
      • 128
        FUSE_METHOD: SUM
        代码部分除了train.py里面的nn.DataParallelWithCallback因为pytorch版本不一样换了一个和dataset因为是mp4格式所以改动了一下其余的都没改。不知道是哪里的问题 是epoch还不够吗

from mtia.

JialeTao avatar JialeTao commented on June 10, 2024

@yanerzidefanbaba It seems the training is not enough. The "num_repeats" is set to 150 by default, if set to 1, train 60 epochs is even less than train 1 epoch of the default config. Since the leraning rate is decayed by a factor of 10 at 60 and 90 epochs, so the losses may still look converging.

from mtia.

benmcmahan avatar benmcmahan commented on June 10, 2024

@benmcmahan Try with goole drive: https://drive.google.com/file/d/1Fv1ts026-6BwOUTexV1KxwW54pQiK1rY/view?usp=sharing

thx, requested access

from mtia.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.