Giter Club home page Giter Club logo

speechdrivestemplates's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

speechdrivestemplates's Issues

关于数据

请问可以用obj格式的mesh来驱动吗?

No audio files in datasets.

"To ease later research, we pack our processed data including 2d human pose sequences and corresponding audio clips."
Hello, I download the dataset from the link you provide,but I found there is no audio files ,just have npz files.
Should I generate audio files by myself ? I want to use Luo's data to train model .

Code about baseline implementation

I saw that the comparison with the baseline in the paper has very good results. Is it possible to provide the code to implement the baseline in the code?

源码

请问什么时候可以公开源码呢?谢谢!

windows下save_video_in_mp4函数执行至ffmpeg.concat报错解决方案

First of all, thank the author for replying to me by email and providing me with some solutions. Now that the problem has been solved, provide an issue for reference.

When I reproduce the code to see the demo effect, I run the following command:python main.py --config_file configs/voice2pose_sdt_bp.yaml --tag luo --demo_input audio1.wav --checkpoint voice2pose_sdt_bp-luo-ep100.pth DATASET.SPEAKER luo
The following error occurred:
FileNotFoundError:[WinError 2]系统找不到指定的文件

I tried many methods, but still could not run the ffmpeg.concat function correctly. Finally, my solution is as follows (give up using ffmpeg.concat and use other methods):
image

language dependent

Hi, how much do you think the model is language dependent? or do you think it is more dependent on the sound of the audio? Thank you for the checkpoints, I managed to make it work :)

SYS.DISTRIBUTED

I'm trying to train xing processed_data from scratch using DDP,
SYS.DISTRIBUTED True
SYS.WORLD_SIZE 4 (4 GPUS)

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).

How to generate gestures corresponding to specific semantics?

Hello, I read your great paper recently ,I don't know much about Co-speech generation task, so I have some questions and would like to ask you for advice:

  1. I found that neither your paper nor other similar SOTA papers seem to incorporate too many semantic features, so is this model unable to generate corresponding gestures with specific semantics? For example: When I say "here", can I generate a corresponding pointing gesture?
  2. At present, there are two main types of methods for the Co-speech generation task: rule-based method and data-driven method. If I want to generate corresponding gestures in specific semantics, should I combine the effect of the rule-based method?

custom dataset

Are there procedures/steps/scripts for training the model on a custom dataset?

3D

Do you think it would be possible to use this model with 3D coordinates as input and output?

Checkpoint

Your model is fascinating and I would like to test the model, can you provide the file about checkpoint please?
您好!我们对您提出的方法十分感兴趣。我们看一看实现效果,但是没有找到权重在哪里下载,可以提供一下吗?十分感谢!

Pose2Pose

I tried the pose2pose training, but the loss seems never converge. And the pose reconstruction is not correct. Dose anything wrong?

Keypoints format

Thanks for your Awesome work!
While generating keypoints using openpose, output format is json file and in your code is npy file.
2_1_gen_kpts.py is not completed yet, Are there instructions to reshape keypoints as required in your script?

Souce video data

Awesome work.
Could you share the source video data of Luo and Xing. I found they are not in Speech2Gesture dataset.
Thanks!

Dataset

How do you suggest to create the SPEECH2GESTURE-dataset ?
We need a csv file and a folder with the images?
Could give some suggestions?

Dataset processing

I have a few questions regarding the dataset processing pipeline,

  • at generate_clips script, why the start index is 80 ??
  • why there are 13 records in each clip labeled idle in the train test split file ??
  • are there any parameters I would need to adjust when creating my own data set??

btw there is an error in 3_2_split_train_val_test.py that you naming the validation samples "val" while the model searches for "dev" labeled records.

生成的视频不是真人视频,是关键点的视频,是什么原因

python main.py --config_file configs/voice2pose_sdt_bp.yaml
--tag oliver
--demo_input demo_audio.wav
--checkpoint
DATASET.SPEAKER oliver
我是按照这个脚本去生成的,生成也成功了,但是是关键点的视频,不是真人的视频,语音匹配上了,但是没有合成真人的视频,大佬能给解答一下嘛

RuntimeError: External code not provide

I run the following command when I want to test the VAE method demo:python main.py --config_file configs/voice2pose_sdt_vae.yaml --tag luo --demo_input audio1.wav --checkpoint checkpoints/voice2pose_sdt_vae-luo-ep100.pth DATASET.SPEAKER luo
Then the following error occured:
`Traceback (most recent call last):
File "main.py", line 73, in
main()

File "main.py", line 69, in main
run(args, cfg)
File "main.py", line 45, in run
pipeline.demo(cfg, exp_tag, args.checkpoint, args.demo_input)
File "D:\SpeechDrivesTemplates\core\pipelines\trainer.py", line 462, in demo
self.base_path = self.setup_experiment(False, exp_tag, checkpoint=checkpoint, demo_input=demo_input)
File "D:\SpeechDrivesTemplates\core\pipelines\trainer.py", line 221, in setup_experiment
self.setup_model(self.cfg, state_dict=checkpoint['model_state_dict'])
File "D:\SpeechDrivesTemplates\core\pipelines\voice2pose.py", line 221, in setup_model
self.model = Voice2PoseModel(cfg, state_dict, self.num_train_samples, self.get_rank()).cuda()
File "D:\SpeechDrivesTemplates\core\pipelines\voice2pose.py", line 48, in init
raise RuntimeError('External code not provide.')
RuntimeError: External code not provide.`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.