fictionarry / talkinggaussian Goto Github PK

[ECCV'24] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

Home Page: https://fictionarry.github.io/TalkingGaussian/

Python 97.88% C++ 0.03% Cuda 1.87% C 0.09% Shell 0.14%

talkinggaussian's Introduction

TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

This is the official repository for our ECCV 2024 paper TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting.

Paper | Project | Video

Installation

Tested on Ubuntu 18.04, CUDA 11.3, PyTorch 1.12.1

git clone [email protected]:Fictionarry/TalkingGaussian.git --recursive

conda env create --file environment.yml
conda activate talking_gaussian
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install tensorflow-gpu==2.8.0

If encounter installation problem from the diff-gaussian-rasterization or gridencoder, please refer to gaussian-splatting and torch-ngp.

Preparation

Prepare face-parsing model and the 3DMM model for head pose estimation.
```
bash scripts/prepare.sh
```

Download 3DMM model from Basel Face Model 2009:

# 1. copy 01_MorphableModel.mat to data_util/face_tracking/3DMM/
# 2. run following
cd data_utils/face_tracking
python convert_BFM.py

Prepare the environment for EasyPortrait:

# prepare mmcv
conda activate talking_gaussian
pip install -U openmim
mim install mmcv-full==1.7.1

# download model weight
cd data_utils/easyportrait
wget "https://n-ws-620xz-pd11.s3pd11.sbercloud.ru/b-ws-620xz-pd11-jux/easyportrait/experiments/models/fpn-fp-512.pth"

Usage

Important Notice

This code is provided for research purposes only. The author makes no warranties, express or implied, as to the accuracy, completeness, or fitness for a particular purpose of the code. Use this code at your own risk.
The author explicitly prohibits the use of this code for any malicious or illegal activities. By using this code, you agree to comply with all applicable laws and regulations, and you agree not to use it to harm others or to perform any actions that would be considered unethical or illegal.
The author will not be responsible for any damages, losses, or issues that arise from the use of this code.
Users are encouraged to use this code responsibly and ethically.

Video Dataset

Here we provide two video clips used in our experiments, which are captured from YouTube. Please respect the original content creators' rights and comply with YouTube’s copyright policies in the usage.

Other used videos can be found from GeneFace and AD-NeRF.

Pre-processing Training Video

Put training video under data/<ID>/<ID>.mp4.

The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5 min.

Run script to process the video.

python data_utils/process.py data/<ID>/<ID>.mp4

Obtain Action Units

Run FeatureExtraction in OpenFace, rename and move the output CSV file to data/<ID>/au.csv.

Generate tooth masks

export PYTHONPATH=./data_utils/easyportrait 
python ./data_utils/easyportrait/create_teeth_mask.py ./data/<ID>

Audio Pre-process

In our paper, we use DeepSpeech features for evaluation.

DeepSpeech

python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # saved to data/<name>.npy

HuBERT

Similar to ER-NeRF, HuBERT is also available. Recommended for situations if the audio is not in English.

Specify --audio_extractor hubert when training and testing.
```
python data_utils/hubert.py --wav data/<name>.wav # save to data/<name>_hu.npy
```

Train

# If resources are sufficient, partially parallel is available to speed up the training. See the script.
bash scripts/train_xx.sh data/<ID> output/<project_name> <GPU_ID>

Test

# saved to output/<project_name>/test/ours_None/renders
python synthesize_fuse.py -S data/<ID> -M output/<project_name> --eval

Inference with target audio

python synthesize_fuse.py -S data/<ID> -M output/<project_name> --use_train --audio <preprocessed_audio_feature>.npy

Citation

Consider citing as below if you find this repository helpful to your project:

@article{li2024talkinggaussian,
    title={TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting}, 
    author={Jiahe Li and Jiawei Zhang and Xiao Bai and Jin Zheng and Xin Ning and Jun Zhou and Lin Gu},
    journal={arXiv preprint arXiv:2404.15264},
    year={2024}
}

Acknowledgement

This code is developed on gaussian-splatting with simple-knn, and a modified diff-gaussian-rasterization. Partial codes are from RAD-NeRF, DFRF, GeneFace, and AD-NeRF. Teeth mask is from EasyPortrait. Thanks for these great projects!

talkinggaussian's People

Contributors

Stargazers

Watchers

talkinggaussian's Issues

ER-Nerf中的checkpoints/<ngp_ep0043.pth>在TalkingGaussian中没有找到，是否取消了checkpoints？

Training code / preprocessing

I'm particularly interested in how you got the key points out for training.
I'm attempting to recreate an older paper - MegaPortraits - and I can't decide best way to extract the keypoints.
johndpope/MegaPortrait-hack#2
johndpope/MegaPortrait-hack#3

请问嘴型对齐的效果还能提升吗？

感谢您的开源贡献
测试了一下，使用DeepSpeech做的Audio Pre-process，语言是英语。生成的视频感觉唇形对齐这块感觉有点不太一致。过程中有一些error但好像生成流程没问题，麻烦请问一下这块是我还有什么问题没解决引发吗，还能提升吗。

whatdidiknow.mp4

$bash scripts/train_xx.sh data/1 output/test/ 0
Traceback (most recent call last):
File "/home/user/TalkingGaussian/train_mouth.py", line 28, in
from torch.utils.tensorboard import SummaryWriter
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/init.py", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 10, in
from tensorboard.compat.proto import event_pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in
from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in
from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in
from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in
from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 621, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Traceback (most recent call last):
File "/home/user/TalkingGaussian/train_face.py", line 28, in
from torch.utils.tensorboard import SummaryWriter
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/init.py", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 10, in
from tensorboard.compat.proto import event_pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in
from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in
from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in
from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in
from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 621, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Traceback (most recent call last):
File "/home/user/TalkingGaussian/train_fuse.py", line 28, in
from torch.utils.tensorboard import SummaryWriter
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/init.py", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 10, in
from tensorboard.compat.proto import event_pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/event_pb2.py", line 17, in
from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/summary_pb2.py", line 17, in
from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in
from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in
from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 621, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

Downgrade the protobuf package to 3.20.x or lower.
Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Looking for config file in output/test/cfg_args
Config file found: output/test/cfg_args
Rendering output/test/
Found transforms_train.json file, assuming Blender data set! [22/08 14:07:18]
Reading Test Transforms [22/08 14:07:18]
794it [00:00, 9100.44it/s]
794it [00:19, 40.06it/s]
Generating random point cloud (10000)... [22/08 14:07:39]
Loading Training Cameras [22/08 14:07:39]
Loading Test Cameras [22/08 14:07:40]
Number of points at initialisation : 10000 [22/08 14:07:40]
Rendering progress: 100%|########################| 794/794 [00:08<00:00, 88.81it/s]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /home/user/miniconda3/envs/talking_gaussian/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
100
200
300
400
500
600
700
LMD (fan) = 2.369922
PSNR = 35.660026
LPIPS (alex) = 0.018883

$ python synthesize_fuse.py -S data/1 -M output/test/ --use_train --audio data/1/whatdidiknow.npy
Looking for config file in output/test/cfg_args
Config file found: output/test/cfg_args
Rendering output/test/
Found transforms_train.json file, assuming Blender data set! [22/08 14:11:14]
Reading Training Transforms [22/08 14:11:14]
7938it [00:00, 8977.61it/s]
165it [00:04, 38.08it/s]
Reading Test Transforms [22/08 14:11:20]
794it [00:00, 8989.86it/s]
165it [00:04, 38.33it/s]
Generating random point cloud (10000)... [22/08 14:11:24]
Loading Training Cameras [22/08 14:11:25]
Loading Test Cameras [22/08 14:11:25]
Number of points at initialisation : 10000 [22/08 14:11:25]
Rendering progress: 100%|########################| 165/165 [00:02<00:00, 67.60it/s]

请问大佬推理目标语音的结果在哪里？

https://github.com/Fictionarry/TalkingGaussian#inference-with-target-audio
我训练完模型，然后使用其他语音进行推理，请问结果是、output\name\train\ours_None\renders 中的 out.mp4吗，为啥没有声音呢

使用五分的声音，推理后发现生成的视频只有4分半，所以语音尾部会对不上
如何避免头剧烈晃动，我发现晃动的很厉害

在新的推理上表现较差

您好，我对自己采集的数据进行了训练和测试，其中替换了中文特征提取网络的权重（网络仍然为hubert，权重更换为新的训练权重），将迭代次数增加到了10w。结果在测试集上有良好的表现，而新的语音则较差。我观察了生成的ply文件，发现在嘴巴处点云拟合结果较差，我想跟您请教一下：
1.这是否是因为高斯分裂产生的结果？因为在除了口型处的点云，其他部分点云效果仍然较好。
2.对比您提供的视频，是否头部有足够的晃动才能获得更好的位姿估计和效果？

针对这种高斯重建不好的情况，您有什么办法或者经验可以处理吗？其头部点云和嘴巴部分高斯渲染结果如下：

python data_utils/process.py data/may/may.mp4怎么要运行那么久

如图所示，Pre-processing Training Video怎么需要那么久

噪点问题解决

推理出来的视频，嘴部有噪点该怎么解决

请问中文语音的唇形同步，是否跟ER-NeRF一样，使用hubert特征会更好？

FileNotFoundError and ZeroDivisionError during training

During the audio pre-processing, I used DeepSpeech. I found only one file ending with '.wav' in the data folder (I was using the Macron video). It is called aud.wav, and I preprocessed it. During the first training attempt, the terminal displayed "aud_ds.npy not found". So, I renamed aud.wav and aud.npy to aud_ds. Then it displayed errors as stated in the title. The output is like this:

(talking_gaussian) min@min-US-Desktop-Aegis-RS:~/Documents/TalkingGaussian$ bash scripts/train_xx.sh data/macron output/marcron 0
Optimizing output/marcron
Output folder: output/marcron [06/08 23:43:47]
Found transforms_train.json file, assuming Blender data set! [06/08 23:43:47]
Reading Training Transforms [06/08 23:43:47]
7938it [00:01, 4091.06it/s]
4417it [01:09, 1.65s/it]scripts/train_xx.sh: line 8: 70450 Killed python train_mouth.py -s $dataset -m $workspace --audio_extractor $audio_extractor
Optimizing output/marcron
Output folder: output/marcron [06/08 23:45:04]
Found transforms_train.json file, assuming Blender data set! [06/08 23:45:05]
Reading Training Transforms [06/08 23:45:05]
7938it [00:01, 4123.37it/s]
5159it [01:35, 2.41s/it]scripts/train_xx.sh: line 9: 70902 Killed python train_face.py -s $dataset -m $workspace --init_num 2000 --densify_grad_threshold 0.0005 --audio_extractor $audio_extractor
Optimizing output/marcron
Output folder: output/marcron [06/08 23:47:08]
Found transforms_train.json file, assuming Blender data set! [06/08 23:47:09]
Reading Training Transforms [06/08 23:47:09]
7938it [00:01, 4124.96it/s]
5198it [01:32, 1.05it/s]scripts/train_xx.sh: line 10: 71034 Killed python train_fuse.py -s $dataset -m $workspace --opacity_lr 0.001 --audio_extractor $audio_extractor
Looking for config file in output/marcron/cfg_args
Config file found: output/marcron/cfg_args
Rendering output/marcron
Found transforms_train.json file, assuming Blender data set! [06/08 23:48:45]
Reading Test Transforms [06/08 23:48:45]
794it [00:00, 3958.49it/s]
794it [00:09, 83.91it/s]
Generating random point cloud (10000)... [06/08 23:48:55]
Loading Training Cameras [06/08 23:48:55]
Loading Test Cameras [06/08 23:48:56]
Number of points at initialisation : 10000 [06/08 23:48:57]
Traceback (most recent call last):
File "synthesize_fuse.py", line 125, in
render_sets(model.extract(args), args.iteration, pipeline.extract(args), args.use_train, args.fast, args.dilate)
File "synthesize_fuse.py", line 93, in render_sets
(model_params, motion_params, model_mouth_params, motion_mouth_params) = torch.load(os.path.join(dataset.model_path, "chkpnt_fuse_latest.pth"))
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 699, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'output/marcron/chkpnt_fuse_latest.pth'
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
/home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /home/min/anaconda3/envs/talking_gaussian/lib/python3.7/site-packages/lpips/weights/v0.1/alex.pth
Traceback (most recent call last):
File "metrics.py", line 215, in
print(lmd_meter.report())
File "metrics.py", line 102, in report
return f'LMD ({self.backend}) = {self.measure():.6f}'
File "metrics.py", line 96, in measure
return self.V / self.N
ZeroDivisionError: division by zero

How to calculate the AUE-(L/U)

Hello, I saw AUE and Sync metrics in the paper. Is there any ready-made code that can calculate these metrics? I want to integrate the logic into metrics. py

NaN or Inf found in input tensor

train_mouth这里报错是什么原因
tb_writer.add_images(config['name'] + "view{}_mouth/depth".format(viewpoint.image_name), (render_pkg["depth"] / render_pkg["depth"].max())[None], global_step=iteration)

/root/anaconda3/envs/TalkingGaussian/lib/python3.8/site-packages/tensorboardX/summary.py:286: RuntimeWarning: invalid value encountered in cast
tensor = (tensor * 255.0).astype(np.uint8)
NaN or Inf found in input tensor

debug后发现

render_pkg["depth"]
tensor([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]], device='cuda:0')

视频预处理的第8步，提示说粗糙栅格化阶段的 bin 大小太小，导致了溢出

我用的视频时w:1080,h:1920，总帧数数34500
请教一下作者要如何改代码才能处理这样的视频

Cannot finish pre-processing video

Hey!

During pre-process after getting focals I'm getting error

nvrtc: error: invalid value for --gpu-architecture (-arch)

and script finishes with error

FileNotFoundError: [Errno 2] No such file or directory: 'data/bud/track_params.pt'

What can be the cause?

推理时输入全是0的音频，嘴依然是微张的，这个有没有办法解决呢？

如果用hubert的话，有办法能流式调吗？

有没交流群，可以给我学习一下，交流的一下的地方？

bash scripts/train_xx.sh，训练时报错，是怎么回事呢

Loading model from: /root/xxx/lib/python3.7/site-packages/lpips/weights/v0.1/alex.pth [06/08 07:17:43]
Training progress: 0%| | 0/50000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_face.py", line 394, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train_face.py", line 147, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/xxx/TalkingGaussian/gaussian_renderer/init.py", line 94, in render
cov3D_precomp = cov3D_precomp)
ValueError: not enough values to unpack (expected 4, got 3)

Exception ignored in: <function tqdm.del at 0x7fd43548f710>
Traceback (most recent call last):
File "/xxx/lib/python3.7/site-packages/tqdm/std.py", line 1145, in del
File "/xxx/lib/python3.7/site-packages/tqdm/std.py", line 1299, in close
File "/xxx/lib/python3.7/site-packages/tqdm/std.py", line 1492, in display
File "/xxx/lib/python3.7/site-packages/tqdm/std.py", line 1148, in str
File "/xxx/lib/python3.7/site-packages/tqdm/std.py", line 1450, in format_dict
File "/xxx/lib/python3.7/site-packages/tqdm/utils.py", line 267, in _screen_shape_linux
TypeError: 'NoneType' object is not callable

Training Results Not Matching Demo Quality – Possible Overclaim?

Hello,
unfortunately, my results are nowhere close to the demo clip shown https://www.youtube.com/watch?v=c5VG7HkDs8I.

lips not moving, worse than reference papers, blurry.
The movements of the synthesized 3D talking head appear less accurate.
Despite using the same configurations mentioned in the paper, the model doesn’t seem to achieve the promised quality.

and Could you provide clarification on:

The exact hyperparameters like seeds ? and dataset you used to train the model in the demo?
Any additional specific pre-processing steps or adjustments that might help improve the quality?
Whether the demo clip was further enhanced or fine-tuned in ways not covered in the training script?

here's my result (DeepSpeech), Is there anything I did wrong? I'm sure that I followed all the instructions : https://drive.google.com/file/d/1MC9O9c5Rtk5_GyTKUL0ak6wHt6qVZbb-/view

数据预处理错误 python3 data_utils/process.py /app/TalkingGaussian/data/May/May.mp4 --task 8

Traceback (most recent call last):
File "/app/TalkingGaussian/data_utils/face_tracking/face_tracker.py", line 11, in
from render_3dmm import Render_3DMM
File "/app/TalkingGaussian/data_utils/face_tracking/render_3dmm.py", line 6, in
from pytorch3d.renderer import (
File "/usr/local/lib/python3.10/dist-packages/pytorch3d/renderer/init.py", line 9, in
from .blending import (
File "/usr/local/lib/python3.10/dist-packages/pytorch3d/renderer/blending.py", line 12, in
from pytorch3d import _C
ImportError: /usr/local/lib/python3.10/dist-packages/pytorch3d/_C.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c107SymBool10guard_boolEPKcl

simple-knn not found

The link to simple-knn provided does not work, could you please check this:

牙齿区域检测不准问题

牙齿区域检测不准（多联通区域），导致生成结果出现跳变怪异的像素。这块是否可以改成上嘴唇+下嘴唇+牙齿区域。

not enough values to unpack (expected 4, got 3)

Hello, I encountered this error when trying to execute bash scripts/train_xx.sh. Could you please advise on how to resolve it? Thank you!

Traceback (most recent call last):
File "train_mouth.py", line 328, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train_mouth.py", line 139, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background)
File "/app/dev/TalkingGaussian/gaussian_renderer/init.py", line 94, in render
cov3D_precomp = cov3D_precomp)
ValueError: not enough values to unpack (expected 4, got 3)
Exception ignored in: <function tqdm.del at 0x7f3931f337a0>

中文口型对不上

你好，我之前使用过ER-NERF训练推理过视频，中文的口型准确度还可以，目前尝试使用了talkingGaussian，出现了中文的嘴型对不上的问题，这是我的训练脚本，训练的iteration是默认的，视频素材是5分钟绿幕视频，25fps:

预处理

python data_utils/process.py /root/share/talkingGaussian/train/data/ao_head/ao_head.mp4

牙齿

export PYTHONPATH=./data_utils/easyportrait
python ./data_utils/easyportrait/create_teeth_mask.py /root/share/talkingGaussian/train/data/ao_head/

hubert 处理音频

python data_utils/hubert.py --wav /root/audio/cosyVoice_fish_faster.wav

train

bash scripts/train_xx.sh /root/share/talkingGaussian/train/data/ao_head/ /root/share/talkingGaussian/train/trial/ao_head/ 2 --audio_extractor hubert

推理

python synthesize_fuse.py -S /root/share/talkingGaussian/train/data/ao_head/ -M /root/share/talkingGaussian/train/trial/ao_head/ --use_train --audio /root/audio/cosyVoice_fish_faster_hu.npy --dilate --audio_extractor hubert

训练出来的视频片段如下：

ao_face.mp4

目前还存在问题：

加了 --dilate参数，牙齿和嘴的缝隙还是会存在
是不是iteration不够，导致中文的嘴型无法对准
修改了train_face.py，但是嘴部还是存在噪点:
loss += 0.01 * lpips_criterion(image_t.clone()[:, xmin:xmax, ymin:ymax] * 2 - 1, gt_image_t.clone()[:, xmin:xmax, ymin:ymax] * 2 - 1).mean()

train_face.py训练后期报错：Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small

你好，感谢开源。我这边在训练custom data的时候，train_face到后期报错如下talkingGaussian/lib/python3.10/site-packages/torch/nn/functional.py", line 782, in _max_pool2d
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small。
能烦请帮忙看看原因吗？感谢感谢。

请问如果想在渲染时更换背景，该怎么操作呢？

感谢您开源如此优秀的工作！在ER-NeRF中，可以使用--bg_img传递图片路径进行修改，请问在TalkingGaussian中是否支持更换背景的功能？具体该如何做呢？谢谢！

Will it be better than ER-NeRF in terms of inferential efficiency ?

Code Release Estimate

Hey guys,

Great work! I wanted to ask kindly for an update on the training code and the weights if possible. Would love to recreate your work :)
Looking forward!

Best

是否能开源躯干训练代码

hubert features code

hubert_hidden = make_even_first_dim(hubert_hidden).reshape(-1, 2, 1024)

Hi, Could you explain to me why you reshape the hidden features here to -1, 2 ? I mean why dont just save the features in (n, 1024).
Thank you so much.

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

train_face时 warm_up结束后loss变成nan，训练结束后测试，人非常抖

求助！！！！

推理后只有脸，其他部分全是黑的，没有背景

What's the PyMCube version?

I use conda to install the environment, the latest version of PyMCube requires numpy~=2.0, which is impossible in python 3.7.13.
I turned to install PyMCube==0.1.4, is the version right?

synthesize_fuse.py跑出来的视频out.mp4没有头，有没有一样情况的？

修改tracking后生成的视频出现了很多问题

1.face_tracking 替换为 synctalk 的 face_tracking
2.在process.py 中添加了 sycntalk 预处理代码中的extract_flow 和 extract_blendshape。修改save_transforms 加载的是 bundle_adjustment.pt 文件，而不是之前的 track_params.pt。修改取消对 trans 进行缩放（即没有 trans / 10.0）。

将tracking 替换为synctalk 的tracking 遇到了很多问题如下视频所示

out.mp4

@Fictionarry 您能看出是什么导致出现这类问题吗，牙齿部分很奇怪

Code Open Source Plan Inquiry

Hello,

I'm interested to know if there are any plans to open source the code on GitHub. Could you please provide an estimated timeline for the code's open source release?

Thank you

inference time

作者，您好！
我运行您的推理过程，发现数据处理时间太长，这使得推理时间太长了，比ernerf都长，这是怎么回事？非常希望能得到您的热情解答，谢谢。

训练走到train_face 报错

[ITER 2000] Evaluating test: L1 0.16910051672082196 PSNR 11.699522570559852 [21/08 12:39:47]

[ITER 2000] Evaluating train: L1 0.16705973148345948 PSNR 11.797615051269531 [21/08 12:39:50]
Training progress: 8%|##########2 | 4000/50000 [00:41<10:26, 73.40it/s, Loss=nan, Mouth=9.1-20.0]
/data/anaconda3/envs/talkinggaussian/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py:549: RuntimeWarning: invalid value encountered in cast
tensor = (tensor * scale_factor).clip(0, 255).astype(np.uint8)

[ITER 4000] Evaluating test: L1 0.16910051672082196 PSNR 11.699522570559852 [21/08 12:40:12]

[ITER 4000] Evaluating train: L1 0.16705973148345948 PSNR 11.797615051269531 [21/08 12:40:15]
Traceback (most recent call last):
File "/data/TalkingGaussian/train_face.py", line 402, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "/data/TalkingGaussian/train_face.py", line 241, in training
training_report(tb_writer, iteration, Ll1, loss, l1_loss, iter_start.elapsed_time(iter_end), testing_iterations, scene, motion_net, render if iteration < warm_step else render_motion, (pipe, background))
File "/data/TalkingGaussian/train_face.py", line 373, in training_report
tb_writer.add_histogram("scene/opacity_histogram", scene.gaussians.get_opacity, iteration)
File "/data/anaconda3/envs/talkinggaussian/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py", line 517, in add_histogram
histogram(tag, values, bins, max_bins=max_bins), global_step, walltime
File "/data/anaconda3/envs/talkinggaussian/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 459, in histogram
hist = make_histogram(values.astype(float), bins, max_bins)
File "/data/anaconda3/envs/talkinggaussian/lib/python3.10/site-packages/torch/utils/tensorboard/summary.py", line 504, in make_histogram
raise ValueError("The histogram is empty, please file a bug report.")
ValueError: The histogram is empty, please file a bug report.

train_mouth.py训练报错

Training progress: 4%|####2 | 1999/50000 [00:10<04:10, 191.30it/s, Loss=0.00217, AU25=1.2-1.3]
[ITER 2000] Evaluating test: L1 0.02868325263261795 PSNR 15.645081138610841 [15/07 10:02:28]

[ITER 2000] Evaluating train: L1 0.029025918990373614 PSNR 15.591155815124512 [15/07 10:02:29]
Training progress: 6%|######2 | 2999/50000 [00:17<04:06, 190.47it/s, Loss=0.00104, AU25=1.2-1.3]]Training progress: 8%|########4 | 3999/50000 [00:30<09:08, 83.87it/s, Loss=0.00201, AU25=1.1-1.3]
[ITER 4000] Evaluating test: L1 0.02868325263261795 PSNR 15.645081138610841 [15/07 10:02:48]

[ITER 4000] Evaluating train: L1 0.029025918990373614 PSNR 15.591155815124512 [15/07 10:02:48]
Training progress: 12%|############7 | 6000/50000 [00:57<08:52, 82.57it/s, Loss=0.00158, AU25=1.1-1.3]
[ITER 6000] Evaluating test: L1 0.02868325263261795 PSNR 15.645081138610841 [15/07 10:03:15]

[ITER 6000] Evaluating train: L1 0.029025918990373614 PSNR 15.591155815124512 [15/07 10:03:16]
Training progress: 16%|################9 | 7999/50000 [01:23<08:30, 82.27it/s, Loss=0.00150, AU25=1.0-1.3]
[ITER 8000] Evaluating test: L1 0.02868325263261795 PSNR 15.645081138610841 [15/07 10:03:41]

[ITER 8000] Evaluating train: L1 0.029025918990373614 PSNR 15.591155815124512 [15/07 10:03:41]
Training progress: 18%|###################2 | 9099/50000 [01:38<08:14, 82.74it/s, Loss=0.00154, AU25=1.0-1.3]Traceback (most recent call last):
File "train_mouth.py", line 335, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train_mouth.py", line 148, in training
render_pkg = render_motion_mouth(viewpoint_cam, gaussians, motion_net, pipe, background)
File "/home/appuser/yulj21/talking_face/talkingGaussian/TalkingGaussian-track-stable-1/gaussian_renderer/init.py", line 238, in render_motion_mouth
motion_preds = motion_net(pc.get_xyz, audio_feat)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/yulj21/talking_face/talkingGaussian/TalkingGaussian-track-stable-1/scene/motion_net.py", line 323, in forward
enc_a = self.encode_audio(a)
File "/home/appuser/yulj21/talking_face/talkingGaussian/TalkingGaussian-track-stable-1/scene/motion_net.py", line 296, in encode_audio
enc_a = self.audio_net(a) # [1/8, 64]
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/yulj21/talking_face/talkingGaussian/TalkingGaussian-track-stable-1/scene/motion_net.py", line 63, in forward
x = self.encoder_conv(x).squeeze(-1)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 307, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 304, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception ignored in: <function tqdm.del at 0x148c5af108c0>
Traceback (most recent call last):
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/tqdm/std.py", line 1065, in del
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/tqdm/std.py", line 1248, in close
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/tqdm/std.py", line 564, in _decr_instances
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/site-packages/tqdm/_monitor.py", line 51, in exit
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/threading.py", line 522, in set
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/threading.py", line 365, in notify_all
File "/home/appuser/yulj21/talking_face/talkingGaussian/anaconda37/lib/python3.7/threading.py", line 348, in notify
TypeError: 'NoneType' object is not callable

我这边预处理中相机姿态估计的方法改成了sycntalk中相机姿态估计的方法。也就是我将transforms_train.json和transforms_val.json生成方式做了修改，其他数据预处理方法保持不变。但是在训练嘴巴区域模型时会报以上错误
@Fictionarry

wav2vec still working?

very great work, wav2vec still working?