nkolot / graphcmr Goto Github PK

Repository for the paper "Convolutional Mesh Regression for Single-Image Human Shape Reconstruction"

License: BSD 3-Clause "New" or "Revised" License

Python 99.24% Shell 0.76%

graphcmr's Issues

Training on Human3.6M provide Upside-down results

I have trained the model on Human3.6M and UP-3D. However, the model produces resonable results of UP-3D, but have upside-down results for Human3.6M, of which the head keypoint is alway wrong.

I have checked my h36m_train.npz using the following code (h36m_train.npz is contained in class H36MDataset(), similar to FullDataset()).
In the following code, I use batch = dataset[2] to directly get the img and annot batch, which already includes flip, rotation or adding noise.
Notice that I have replaced self.to_lsp = list(range(14)) with self.to_h36m = list(range(13)) + [18], but this member seems not used in the train code, which may not be the cause(I'm not sure).


class Visualize(object):
    def __init__(self, options):
        self.options = options
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # Renderer for visualization
        self.smpl = SMPL().to(self.device)
        self.renderer = Renderer(faces=self.smpl.faces.cpu().numpy())
        # LSP indices from full list of keypoints
        self.to_lsp = list(range(14))
        self.to_h36m = list(range(13)) + [18]

    def vis(self, input_batch):
        input_batch = {k: v.to(self.device) if isinstance(v, torch.Tensor) else v for k,v in input_batch.items()}
        rend_imgs = []

        img = input_batch['img_orig'].cpu().numpy().transpose(1,2,0) #(H, W, C)
        gt_keypoints_2d = input_batch['keypoints'].cpu().numpy()
        gt_keypoints_2d_ = gt_keypoints_2d[self.to_h36m]
        gt_pose = torch.unsqueeze(input_batch['pose'], 0)
        gt_betas = torch.unsqueeze(input_batch['betas'], 0)
        gt_vertices = self.smpl(gt_pose, gt_betas)

        vertices = gt_vertices[0].cpu().numpy()

        pred_keypoints_2d_ = gt_keypoints_2d_[:, :2]
        pred_camera = torch.unsqueeze(torch.Tensor([0,0,0]), 0)
        cam = pred_camera[0].cpu().numpy()

        rend_img = visualize_reconstruction(img, self.options.img_res\
            , gt_keypoints_2d_, vertices, pred_keypoints_2d_, cam, self.renderer)
        rend_imgs.append(torch.from_numpy(rend_img))

    
        rend_imgs = make_grid(rend_imgs, nrow=1)
        plt.imshow(rend_imgs)
        
        plt.savefig('others/h36m_vis.png')


with open('/hpn/logs/paper_step1/config.json') as f:
    json_args = json.load(f)
    json_args = namedtuple('json_args', json_args.keys())(**json_args)
options = json_args
dataset = H36MDataset(options)

batch = dataset[2] #already includes flip, rotation or adding noise

visulize = Visualize(options)
visulize.vis(batch)

And it seems that gt 2D pose is correct, and data preprocess(like flip and rotation) is also correct.

So, I doubt that problem lies in the training code. Could you help me to locate where the problem may come from? Thank you very much!

Any other detail on ask.

Regarding fully connected baseline

I was unable to find the details for the fully connected mesh regression baseline in the paper. Will you make the code for it available such that the baseline can be further examined?

About test performance

Hello,
Thank you for releasing your great work. It helps a lot.

 But I have a little trouble in retraining the model.
 I added human3.6m dataset and set hyperparamters as paper reported, and trained the model from scratch. However, I can only get the Reconstruction Error in about 100 mm.

 I wonder whether there are some special tricks used in training procedure?

Thanks!

Weak-perspective camera coordinates

Hello, great job.
I've a question about the estimated perspective camera position. I assume that the SMPL model goes as (x,y,z) and after watching the values at the estimated camera I noticed that the first coefficient was big ( x? ) , but I expected the last coordinate to be big ( z )

Can you provide datasets/extras/h36m_valid_protocol1.npz?

Hi Nikos,

Thank you for sharing your code.

I tried to run the evaluation code but missing datasets/extras/h36m_valid_protocol1.npz. Can you provide that file?

Thank you
Cheers
Wei

checking the condition number for svd

Hi, in the following code in GraphCMR/models/smpl_param_regressor.py, I am wondering whether we should check the condition number of A[i] before using svd? If the condition number of A[i] is close to 1, then backpropagation through svd might give nan values.

def batch_svd(A):
    """Wrapper around torch.svd that works when the input is a batch of matrices."""
    U_list = []
    S_list = []
    V_list = []
    for i in range(A.shape[0]):
        U, S, V = torch.svd(A[i])
        U_list.append(U)
        S_list.append(S)
        V_list.append(V)
    U = torch.stack(U_list, dim=0)
    S = torch.stack(S_list, dim=0)
    V = torch.stack(V_list, dim=0)
    return U, S, V

preprocess datasets of h36m.py

Hello, @nkolot I have dataset of human36m, I preprocess it as

GraphCMR/datasets/preprocess/h36m.py

Line 104 in 4e57dca

np.savez(out_file, imgname=imgnames_,

and I added a few things as shown below(both train and test sets are treated this way), but I do not know whether it is right. As a Beginner, I am looking forward to your help.
Thank you!

Loading Resnet50 pretrained?

Thank you for your great works.
I wonder what is the benefit of loading resnet50 pretrained? Why don't you training from scratch?
If I load resnet50 pretrained, fix the parameters of the Resnet50 and Only train the GCNN part? How do you think about the result of this?
Regards,

how to retrain this model in a new dataset with the real SMPL model

Thanks for your excellent work.
Your research is greatly helpful for me.
However, I have a problem with how to retrain this model in a new dataset with the real SMPL model.
Thanks for your reply.

How to train model with human3.6m datasets

Good job!
I want to train this model with human3.6m datasets, Could you please let me know how to go with it ?
Thanks!!

About the training dataset of Human36M

Hi, thanks for the great work.
About Due to license limitiations, we cannot provide the SMPL parameters for Human3.6M (recovered using MoSh). , I want to know where can we download these data, and is it provided by the official Human36m dataset?

hyper-params differ from those in the paper?

In README it said that when choosing training hyper-params, the default values are the ones used to train the models in the paper.
However I found there are some differences between code default settings and those in the paper.

--batch_size
In default settings it is 2. However , it is 16 in the paper.

--lr
In default settings it is 2.4e-4. However, it is 3e-4 in the paper.

--num__epochs & --dataset
In default settings you use 50 epochs and 'itw' option which excludes Human3.6M dataset. However, in the paper you talked about a two-stage training strategy.

--rot_factor & --noise_factor & --scale_factor
In default settings these options are used, but they are not mentioned in the paper.

I feel a little confused about which hyper-params set to choose. Could you help me here to better reproduce the results in the paper?

Pretrained model

Thanks for your great work! I would like to download the pretrained models but the site of them is 404. Could you please tell me how to download them. Thanks! @nkolot

Could you provide MPII/annot files?

The link of MPII provided in the datasets/preprocess/README.txt is to download the images. Could you also provide the annotation files you used? Thanks!

Training parameters?

Hi @nkolot , I have run your code. I found it can work without Human 3.6M data. Once I add Human3.6 M data into training set, GraphCMR seems cannot converge.
My setting is batch_size=32, shffule_train=True, and others is exactly same as config.py. By the way, my human3.6M data includes Moshed-annotations.
Could you tell me your training configuration?

About loss_keypoints_3d

Hi I used dataset option 'itw' to train a new model by using default params. The 'loss_keypoints_3d' and 'loss_keypoints_3d_smpl' are always 0. Thus the results are worse than the pretrained model.

Do you have any other recommendations to provide the 3D pose information? Since human3.6M never replies to my email. And I checked the gt_keypoints_3d = input_batch['pose_3d'] and it shows:

tensor([[[0., 0., 0., 0.],
[0., 0., 0., 0.], ...

Can you please give the joints name order of the 3D joints in training process and explain what these four 0.s stand for? Thanks a lot.

Different J_regressor matrix for training and evaluation

I noticed that the regressor matrice to get 3D joints from predicted vertices are different between for training and for evaluation. In training, smpl.get_joints() multiply predicted vertices with J_regressor of SMPL_FILE and JOINT_REGRESSOR_TRAIN_EXTRA to get joints while run_evaluation() uses JOINT_REGRESSOR_H36M. What's the concern here? Thanks!

Training on data of segmentation/DensePose prediction

Good job! I have a question about the inputs.

You present experiments with a variety of inputs include RGB images, segmentation, and DensePose. I find your codes have handled RGB images with ResNet and there seems no code for the others. Do you have any plan to release the code for the other two representation?

I understand that this paper is not to focus on the effect of different input representation and thus you don't release the code on the other two. It will be also great if you could help me with the following questions.

Do you treat segmentation/DensePose prediction as RGB images and feed into ResNet? Or do you use some other encoder, such as DensePose network to get features for GCN?

Thanks!

where is preprocess_datasets.py?

About the SMPLParamRegressor

I wonder why the SMPLParamRegressor don't go in an iterative way to give the final smpl param? Like described in paper End-to-end Recovery of Human Shape and Pose?

I mean at test time, by using a 2D skeleton detector, we can first align the input body to the same orientation, once we get the output smpl (theta0, beta0) and camera param, we can fix the beta0 and only update theta0 (taking theta0 as initial value) according to the loss between the detected skeleton and the projected smpl skeleton, I think the result should be much more better for testing images.

I'm I right? Are there any shortcomings leaving out efficiency?

Looking forward to any discussion with you !!! ;p

How to get 3d joints from demo.py and visualize it

I am interested in obtaining joints from the inferred SMPL image and visualize it similar to described in README of this project: https://github.com/gulvarol/smplpytorch.

I changed

GraphCMR/demo.py

Line 118 in 4e57dca

pred_vertices, pred_vertices_smpl, pred_camera, _, _ = model(norm_img.cuda())

to
pred_vertices, pred_vertices_smpl, pred_camera, smpl_pose, smpl_shape = model(...) to get smpl_pose (of shape torch.Size([1, 24, 3, 3])). Then I just flattened it by doing smpl_pose.cpu().data.numpy()[:, :, :, -1].flatten('C').reshape(1, -1) and used the resulting (1, 72) pose params as input in pose_params variable of smplpytorch demo.

The resulting visualization doesn't look correct to me. Is this the right approach? Perhaps there is an easier way to do what I am doing.

Preprocess of ground truth keypoints_3d on Human3.6M

I tried to generate pseudo_keypoints_3d from the ground truth SMPL parameters of the Human3.6M dataset using smpl.get_joints(). I hope it can be consistent with the ground truth keypoints_3d. But I noticed that the MPJPE can be more than 0.2m without subtracting the pelvis(root). And the MPJPE is about 0.03m when subtracting the pelvis.

Here is a figure to show the distance between the joints of gt_keypoints_3d(in red) and pseudo_keypoints_3d(in blue):

As the pred_pelvis and gt_pelvis are subtracted from the predicted_keypoints_3d and gt_keypoints_3d separately for the loss, I guess it doesn't matter if I don't preprocess gt_keypoints_3d and gt_smpl_paramteres to make them consistent.

Did you preprocess them? If you did, how did you do that? Thanks!

How to assign the correspoding vertices of smpl with respect to skeleton points?

I noted that you assigned different vertices of smpl in different datasets. And there is some mismatches of these assignments.
For example, when I used the ground-truth moshed parameters of Human3.6m to generate smpl model, and use JOINT_REGRESSOR_TRAIN_EXTRA to regress 3d skeleton points, the MPJPE is quite high, about 32 mm.
So I want to re-assign these vertices to better evaluate your great work. If this is done, I am willing to attach the new joint regressor to this repo. But I am a little unfamiliar with how this assignment process are done. Could you kindly help me out here which tool I need to assign these vertices in smpl?

about the joints used

Hi, it seems that you use 24 joints in total, can you tell what the order/names of these joints?
Thanks,

opendr?

When I convert the python version to python3 on Ubuntu, I cannot install opendr, but when I convert the python version to python2, I cannot install the other dependencies?

What's the purpose of this processing on rotation matrix?

Sorry, I'm new to 3D reconstruction community. I can't figure out what the following code tends to do in class SMPLParamRegressor() .

    def forward(self, x):
        """Forward pass.
        Input:
            x: size = (B, 1723*6)
        Returns:
            SMPL pose parameters as rotation matrices: size = (B,24,3,3)
            SMPL shape parameters: size = (B,10)
        """
        batch_size = x.shape[0]
        x = x.view(batch_size, -1)
        x = self.layers(x)
        rotmat = x[:, :24*3*3].view(-1, 24, 3, 3).contiguous()
        betas = x[:, 24*3*3:].contiguous()
        rotmat = rotmat.view(-1, 3, 3).contiguous()
        orig_device = rotmat.device
        if self.use_cpu_svd:
            rotmat = rotmat.cpu()
        U, S, V = batch_svd(rotmat)

        rotmat = torch.matmul(U, V.transpose(1,2))
        det = torch.zeros(rotmat.shape[0], 1, 1).to(rotmat.device)
        with torch.no_grad():
            for i in range(rotmat.shape[0]):
                det[i] = torch.det(rotmat[i])
        rotmat = rotmat * det
        rotmat = rotmat.view(batch_size, 24, 3, 3)
        rotmat = rotmat.to(orig_device)
        return rotmat, betas

Could you give a short explanation about what's the purpose of using svd and det? What changes have been made to rotation matrix after this processing?
And is it a common sense to do so on SMPL model? Because it seems that HMR by Kanazawa didn't process rotation matrix like this.

Compute 'A', 'D', 'U' matrices

Hello! I want to try use other human template(SMPLX) in your work, I need precompuute 'A', 'D', 'U' matrices according this , so I test using the smpl template first, but I can't get the same result as you. Please point out my mistake, Thank you !
I changed part of the code of ‘coma’

from __future__ import print_function
import mesh_sampling
from psbody.mesh import Mesh, MeshViewer, MeshViewers
import numpy as np
import json
import os
import copy
import argparse
import pickle
import time

print("Loading data .. ")
reference_mesh_file = 'data/smpl_neutral_vertices.obj'
reference_mesh = Mesh(filename=reference_mesh_file)

ds_factors = [4,1]	# Sampling factor of the mesh at each stage of sampling
print("Generating Transform Matrices ..")

# Generates adjecency matrices A, downsampling matrices D, and upsamling matrices U by sampling
# the mesh 4 times. Each time the mesh is sampled by a factor of 4

M,A,D,U = mesh_sampling.generate_transform_matrices(reference_mesh, ds_factors)

print(type(A))

np.savez('mesh_downsampling_test.npz', A = A, D = D, U = U)

Why do you use different focal length for training and inference?

Hi,
In function visualize_reconstruction(), the default focal length is 1000., while in your config file and function render(), the default focal length is 5000. The function visualize_reconstruction() is only called during training and function render() is called during inference. I am confused about the motivation to change the focal length in different stages. Thanks in advance!

run demo.py

I run demo.py with the command in README,Why can't see any results？im1010_gcnn_side.png and im1010_smpl_side.png are like as follows:

demo img resolution

Hi, I`m wondering can I change the output Img size? I know the training part use 224224 img , but in the demo file , after rendering, I resize the img to (720,720) but it still show a 224224 image, with pink 3D model on it.The input img is the capture from a web cam, which res is 1280 * 720

Validation set accuracies (without human36 data)

Hi there,

I just wondered if you could paste validation set accuracies when training from scratch without the human3.6m data...

I ran an experiment over the weekend (50 epochs, batch_size 16, learning rate 3e-4) and get (for example) on up-3d:

Shape Error (NonParam): 309.1791097267133
Shape Error (Param): 311.3052140344197

This is considerably worse than downloaded model file 'data/models/model_checkpoint_h36m_up3d_extra2d.pt' which I assume has also been trained on human36m.

Shape Error (NonParam): 96.80226868738498
Shape Error (Param): 98.1649226233573

Do you think performance without human3.6m data should degrade this much?

Asking for the weight of losses

Hi @nkolot, thanks for your code.
May I ask for the weight of losses in your paper?

cpu occupacy is very high

When I ran either train or eval code, the cpu occupacy is very high, and it's very hard to run multi-task on a multi-gpu machine, due to the cpu computing constraint. Did you encounter this problem in your training? Or you could give me some hints about which part of your implementation may be cpu exhaustive. Thank you very much!
For example, I show some img about cpu occupaty. My cpu core is intel i9-9900K.
For simlilarity, I ran the code of evaluation, i.e. eval.py.
one task

multi task on a two-gpu machine:

openpose

Train the graph adjacency matrix?

Hi,
In your code, you are loading the precomputed graph adjacency matrix and upsampling/downsampling matrices. Is there a way to train it?
Can you please tell how did you compute both of them?

The problem of camera parameter “sc”

Hello,Thank you for your excellent work!
I can get normal tx, ty and shape-loss during training process, but it's hard for me to get the normal camera parameter "sc", so it's hard for me to reduce keypoints-loss, at the same time, the model points will diverge.
Hope to get your advice!

Praise from a newbie

The code style and architecture are amazing! And I enjoy a lot.
Thank you for sharing the neat code base.

Environment requirement

Hi, I am new to learn about human modeling and machine learning. I just got a question about the environment. I tried to set up your project with Python 2 environment in anaconda on Windows, but I cannot find torch 1.0.0 API available in Windows. And I found that torch is available on macOS and Linux. Is this project has to run on Linux? Thank you.

Provide code (not data) for Human3.6m + SMPL training

Hi there, great job!

Although I'm aware that obtaining the corresponding SMPL parameters for Human3.6m is now beyond the license, it would be super useful if the code for training on these parameters were still available.

I'm in a fortunate position that I had them saved before they were taken down so want to be able to train directly on them, but more broadly, other users may wish to implement their own MOSH procedure to generate these parameters in order to reproduce the results.

I'm looking at the code you've used for Unite the People as a template but if you could provide the Human3.6m code that would be awesome.

Thanks,
Ben

question when I read the code

Hi, thx for your contribution. I find when you preprocess data, you save a (24, 3) part, but in the file base_dataset you read a "partname" from the npz file. What the "partname" and "part" mean? Are they the same thing?

Running ‘demo.py' can't get good results

Hello, I use the smplx template for training and only train the graphcnn module. In the process of training, I get good output results, but when I load checkpoint to run demo, I can't get good results most of the time. Do you have any suggestions
In addition, will the parameters of resnet50 network be updated? Even if the pretrained model is used, will it affect the output of demo
Thank you!

img_orig is the same as img

In dataset/base_dataset.py

       182  item['img'] = self.normalize_img(img)
       183  # Store image before normalization to use it in visualization
       184 item['img_orig'] = img

normalize_img() happens in place. It is better to change to

      item['img_orig'] = torch.clone(img) 
      item['img'] = self.normalize_img(img)

As in requirements.txt, torch=1.0.0. Since the version 1.0.1, torchvision.tranformation.normalize() has an extra parameter inplace and its default value is false.

The additional files could not be obtained.

I try to visit https://seas.upenn.edu/~nkolot/data/cmr/models.tar, but the website shows 404. I would like to know how to obtain these files.

Multi-gpu training

Thank you for the great work!

Could you please let me know how to train GraphCMR using multi-gpu, please? I have tried to modify the codebase by using torch.nn.DataParallel for self.graph_cnn and self.smpl_param_regressor in /train/trainer.py, but it seems not working. I guess we need other modifications, but I am not so sure how to do that.

Any suggestions or tips will be greatly appreciated.

Generate 3D mesh for Human36M

Very nice work. The other thread is closed, but I'd like to understand how we may generate the 3D (smpl) meshes for the Human36M dataset.

Does the H36M dataset contain enough info to generate 3D meshes?
Is there code available for this process, or did you implement something yourselves?

You mention on your github that you MoSHed the data to get SMPL parameters. On the mosh project page we don't see any code that performs this.

We are training a modified network from scratch, and need more ground truth 3D meshes. Any info would be much appreciated!

questions about "mesh_downsampling.npz"

I want know how to compute mesh_downsampling.npz("Extra file with precomputed downsampling for the SMPL body mesh." )
Can you provide code about it ? Thank you !

wrong mesh volume

Hi ,

i try to use your amazing work to estimate a person volume by using the fitted SMPL Mesh. I was able to transform the Mesh in my respective camera coordinate system. I calculate for each body region (head, hands, arms, legs etc.) its respective volume and sum them up. I am using the world coordinates from the transformed mesh. A render to the image matches the persons silhouette.

But I found that different persons with different overall mesh representation seem to contain all similar volume of +- 0.05m^3.

It is crucial for my work to have a relatively good estimation of the persons volume but it seems that using the SMPL model wont be a good way to do this.

Could you give me ideas? Is this conversion from the SMPL Mesh to my camera coordinate system correct ? I feel it could be a scaling issue. I hope you could help me out.
@
BR

There is a dangerous inplace operation in SMPL()

I carefully read your code, and found a dangerous inplace operation, which may lead to Runtime Error.
line 87-88 in models.smpl.py:

for i in range(1, 24):
    G[:,i,:,:] = torch.matmul(G[:,self.parent[i-1],:,:], G_[:, i, :, :])

G is modified inplace after matmul by G_.

To check its correctness, I wrote the following test code:

device = 'cuda'
pred_theta = torch.ones([1,24,3,3], requires_grad=True).to(device)
pred_beta = torch.ones([1,10], requires_grad=True).to(device)
smpl = SMPL().to(device)

pred_vertices = smpl(pred_theta, pred_beta)

torch.sum(pred_vertices).backward()

And pytorch report RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

when I fix this error like this:

G_new = G.clone()   
for i in range(1, 24):
        G_new[:,i,:,:] = torch.matmul(G[:,self.parent[i-1],:,:], G_[:, i, :, :])

Then the test code runs without any problem.

Although I found this indeed an dangerous inplace operation, however, when I ran the training code, pytorch did't report any error! I guess maybe gradient flow is cut off somewhere, so it did't need to BP through SMPL(). But this explanation sounds contradicts with the optimization process.

How do you think? Is this inplace operation a real problem? If not, how do you manage to neglect it without causing any problem?

nkolot / graphcmr Goto Github PK

graphcmr's Issues

Recommend Projects

Recommend Topics

Recommend Org