Giter Club home page Giter Club logo

Comments (7)

Wangt-CN avatar Wangt-CN commented on June 6, 2024

Hi @hanhan0521 , may I ask which config file do you use? Since with current tsv data file, you do not need to use the anno_list for Tiktok fine-tuning.

from disco.

hanhan0521 avatar hanhan0521 commented on June 6, 2024

Hello author, I have solved this problem and found that it exists in the train_captition.tsv file.
I now have another question, which is how do I pre-train my own dataset。I also got the preprocessed output from Openpose。
How to set the following path?Thank you!!!!!!

self.total_num_videos = 340
self.anno_path = 'GIT/{:05d}/labels/{:04d}.txt'
self.image_path = '{:05d}/images/{:04d}.png'
self.anno_pose_path = '{:05d}/openpose_json/{:04d}.png.json'
self.ref_mask_path = '{:05d}/masks/{:04d}.png'

    self.image_path_web = '{}/{}'
    self.ref_image_path_web = '{}/{}'
    self.anno_pose_path_web = '{}/openpose_json/{}.json'
    self.ref_mask_path_web = '{}/groundsam/{}.mask.jpg'

    self.image_paths_list = []
    self.ref_image_paths_list = []
    self.ref_pose_paths_list = []
    self.anno_list = []
    self.anno_pose_list = []
    self.anno_init_pose_list = []
    self.mask_list = []

    ft_video_idx = getattr(args, 'ft_idx', '001_1.57_2.17_1x1') # default elon
    if split == 'train': # for training video
        # video_idx = ['001_1.57_2.17_1x1'] # elon mask 2
        # video_idx = ['007_7.36_7.44_1x1'] # 007
        # video_idx = ['001_1.57_2.17_9x16', '001_11.46_11.54_9x16', '001_5.37_5.44_9x16', '001_8.14_8.27_9x16']  # elon mask 1+2+3+4
        video_idx = [ft_video_idx] # 007
        dataset_prefix = self.args.web_data_root
    else: # for pose video
        video_idx = [335, 137]
        # ref_video_idx = '001_1.57_2.17_1x1' # 007
        # ref_video_idx = '007_7.36_7.44_1x1' # 007
        ref_video_idx = ft_video_idx # 007
        dataset_prefix = self.args.tiktok_data_root

from disco.

Wangt-CN avatar Wangt-CN commented on June 6, 2024

Hi @hanhan0521 , for the pre-training, actually you do not need the pose. If you want to pretrain on your own data and do not want to use the tsv format, you may need to revise the dataloader code for the pre-training to use the raw image/mask data. You can refer to this file (https://github.com/Wangt-CN/DisCo/blob/main/dataset/tiktok_controlnet_t2i_imagevar_combine_mask.py). But note that this file is for fine-tuning, therefore it still consists pose that is no use in pretrain.

from disco.

quqixun avatar quqixun commented on June 6, 2024

I met the same error

FileNotFoundError: [Errno 2] No such file or directory: 'keli/dataset/TikTok_dataset/GIT/00137/labels/0262.txt'

when I tried to run the command (shown as in Human-Specific Fine-tuning section) with my own settings:

AZFUSE_USE_FUSE=0 NCCL_ASYNC_ERROR_HANDLING=0 CUDA_VISIBLE_DEVICES=0 \
python finetune_sdm_yaml.py                                          \
    --cf                        ./config/ref_attn_clip_combine_controlnet_imgspecific_ft/webtan_S256L16_xformers_upsquare.py \
    --pretrained_model          ./ft_checkpoint/moretiktok_nocfg/mp_rank_00_model_states.pt \
    --root_dir                  ./run_test                   \
    --ft_idx                    ./finetune_data/001          \
    --log_dir                   ./exp/human_specific_ft_001/ \
    --do_train                                               \
    --local_train_batch_size    32                           \
    --local_eval_batch_size     32                           \
    --epochs                    20                           \
    --deepspeed                                              \
    --eval_step                 500                          \
    --save_step                 500                          \
    --gradient_accumulate_steps 1                            \
    --learning_rate             1e-3                         \
    --fix_dist_seed                                          \
    --loss_target               "noise"                      \
    --unet_unfreeze_type        "crossattn"                  \
    --refer_sdvae                                            \
    --ref_null_caption          False                        \
    --combine_clip_local                                     \
    --combine_use_mask                                       \
    --conds                     "poses" "masks"              \
    --freeze_pose               True                         \
    --freeze_background         False                        \
    --ft_iters                  500                          \
    --ft_one_ref_image          False                        \
    --strong_aug_stage1         True                         \
    --strong_rand_stage2        True

The file structure of models and specific human dataset is:

Disco
├── ft_checkpoint
│   └── moretiktok_nocfg
│       └── mp_rank_00_model_states.pt
├── run_test
│   └── diffusers
│       └── sd-image-variations-diffusers
│           ├── feature_extractor
│           ├── image_encoder
│           ├── safety_checker
│           ├── scheduler
│           ├── unet
│           └── vae
├── finetune_data                                                                                                                                                                                                     
│   └── 001                # human specific dataset                                                                                                                                                                             
│       ├── grounded_sam   # ---|                                                                                                                                                                        
│       ├── openpose_json  #    |--> get preprocessed data followling steps in                                                                                                                                                   
│       ├── openpose_vis   # ---|    https://github.com/Wangt-CN/DisCo/blob/main/PREPRO.md
│       ├── 0001.png
│       ├── 0002.png
│       └── ......
├── keli
│   └── dataset
│       └── TikTok_dataset                                                                                                                                                                                          
│           ├── 00001                                                                                                                                                                                               
│           │   ├── densepose                                                                                                                                                                                       
│           │   ├── images                                                                                                                                                                                          
│           │   └── masks
│           ├── 00002
│           └── .....
└── .....

How to run human specific fine-tunning with my own dataset ?
How to get the GIT directory which can be found in all ./dataset/tiktok_controlnet_t2i_imagevar_combine_specifcimg*.py ?

from disco.

Wangt-CN avatar Wangt-CN commented on June 6, 2024

Hi @quqixun, Thanks a lot for reporting the confusion.

  1. First, please check if the config file indicates to use the data python file, e.g., https://github.com/Wangt-CN/DisCo/blob/main/dataset/tiktok_controlnet_t2i_imagevar_combine_specifcimg_web_upsquare.py.
  2. Actually the data in GIT folder only contains the frame-name and only is used in validation (https://github.com/Wangt-CN/DisCo/blob/main/dataset/tiktok_controlnet_t2i_imagevar_combine_specifcimg_web_upsquare.py#L192). For our experiment, we just use TikTok data for validation. I am trying to upload the GIT folder to share on Github.
  3. But actually, GIT file only contains the frame name. Therefore, as a quick work-around, you can just delete the use of the anno_path and the related variable. And then revise this part to provide a customized frame name.
  4. Due to the flexibility of current human-specific ft pipeline, there may be many work-arounds to do this. Sorry for the confusion and I will then write a brief intro for easily adaptation to the user data.

from disco.

hanhan0521 avatar hanhan0521 commented on June 6, 2024

Hello author, I would like to use our code to achieve the posture of driving the target character with its own skeletal keypoints. But when I run the "[/dataset/tiktok_controlnet_t2i_imagevar_combine_mask.py]" file, the cond folder I get is the pose of its own dataset, and the gt folder is still the original human image, and it is not driven, why is that?

image
gt Dance_00001_0001png
cond Dance_00001_0001png

from disco.

Wangt-CN avatar Wangt-CN commented on June 6, 2024

@hanhan0521 Hi, actually I cannot understand you question. But from the code you give, it seemed that you set the anno_pose_json to the Tiktok data. This file contains the skeleton annotation human. BTW, what stage are you trying?

from disco.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.