anibali / dsntnn Goto Github PK

PyTorch implementation of DSNT

License: Apache License 2.0

Python 100.00%

dsntnn's Introduction

⚠️ I have helped integrate DSNT into Kornia (from v0.1.4). New users are advised to use that implementation instead of this one. Existing users should note that the "normalised" coordinate system differs between the two implementations (see #15).

PyTorch DSNT

This repository contains the official implementation of the differentiable spatial to numerical (DSNT) layer and related operations.

$ pip install dsntnn

Usage

Please refer to the basic usage guide.

Scripts

Running examples

$ python3 setup.py examples

HTML reports will be saved in the examples/ directory. Please note that the dsntnn package must be installed with pip install for the examples to run correctly.

Building documentation

$ mkdocs build

Running tests

Note: The dsntnn package must be installed before running tests.

$ pytest                                 # Run tests.
$ pytest --cov=dsntnn --cov-report=html  # Run tests and generate a code coverage report.

Other implementations

Tensorflow: ashwhall/dsnt
- Be aware that this particular implementation represents coordinates in the (0, 1) range, as opposed to the (-1, 1) range used here and in the paper.

If you write your own implementation of DSNT, please let me know so that I can add it to the list. I would also greatly appreciate it if you could add the following notice to your implementation's README:

Code in this project implements ideas presented in the research paper "Numerical Coordinate Regression with Convolutional Neural Networks" by Nibali et al. If you use it in your own research project, please be sure to cite the original paper appropriately.

License and citation

This project is open source under the terms of the Apache License 2.0.

If you use any part of this work in a research project, please cite the following paper:

@article{nibali2018numerical,
  title={Numerical Coordinate Regression with Convolutional Neural Networks},
  author={Nibali, Aiden and He, Zhen and Morgan, Stuart and Prendergast, Luke},
  journal={arXiv preprint arXiv:1801.07372},
  year={2018}
}

dsntnn's People

Contributors

Stargazers

Watchers

dsntnn's Issues

3 dimension coordinate regression

Use your model to predict 2 points which have x-y-z axis,what modify should i do?

For the normalized_linspace function

According to the paper，Is it should be

first = -(length + 1.0) / length

output coords are negative floats

The network is defined by:

class Net(nn.Module):
    
    def __init__(self, layers):
        super(Net, self).__init__()
        if layers == 18:
            model = models.resnet18(pretrained=True)
        elif layers == 34:
            model = models.resnet34(pretrained=True)
        # change the first layer to recieve five channel image
        model.conv1 = nn.Conv2d(5, 64, kernel_size=7, stride=2, padding=3,bias=True)
        # change the last layer to output 32 coordinates
        # model.fc=nn.Linear(512,32)
        # remove final two layers(fc, avepool)
        model = nn.Sequential(*(list(model.children())[:-2]))
        for param in model.parameters():
            param.requires_grad = True
        self.resnet = model
        
    def forward(self, x):
       
        pose_out = self.resnet(x)
        return pose_out

class CoordRegressionNetwork(nn.Module):
    def __init__(self, n_locations, layers):
        super(CoordRegressionNetwork, self).__init__()
        self.resnet = Net(layers)
        self.hm_conv = nn.Conv2d(512, n_locations, kernel_size=1, bias=False)

    def forward(self, images):
        # 1. Run the images through our Resnet
        resnet_out = self.resnet(images)
        # 2. Use a 1x1 conv to get one unnormalized heatmap per location
        unnormalized_heatmaps = self.hm_conv(resnet_out)
        # 3. Normalize the heatmaps
        heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)
        # 4. Calculate the coordinates
        coords = dsntnn.dsnt(heatmaps)

        return coords, heatmaps

And the training codes are as follows:

for i, data in enumerate(tqdm(train_dataloader)):
            # training
            images, poses = data['image'], data['pose']
            images, poses = images.to(device), poses.to(device)
            coords, heatmaps = net(images)

            # Per-location euclidean losses
            euc_losses = dsntnn.euclidean_losses(coords, poses)
            # Per-location regularization losses
            reg_losses = dsntnn.js_reg_losses(heatmaps, poses, sigma_t=1.0)
            # Combine losses into an overall loss
            loss = dsntnn.average_loss(euc_losses + reg_losses)
        
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            train_loss_epoch.append(loss.item())

I converted keypoint groundtruth into float(-1,1), but all the predicted coords are negative floats:

tensor([[[-0.1286, -0.0830],
         [-0.1169, -0.0810],
         [-0.1205, -0.1476],
         ...,
         [-0.1767, -0.3881],
         [-0.1970, -0.2403],
         [-0.3226, -0.3909]],

        [[-0.0694, -0.0165],
         [-0.0744, -0.0288],
         [-0.1027, -0.0873],
         ...,
         [-0.0766, -0.3926],
         [-0.1146, -0.2482],
         [-0.0907, -0.1812]],

        [[-0.4647, -0.3639],
         [-0.4430, -0.3409],
         [-0.2485, -0.2339],
         ...,
         [-0.2906, -0.4541],
         [-0.3648, -0.3034],
         [-0.4190, -0.3880]],

and the heatmap seems strange:

the visualization results:

Question when I use dsnt in my net

I have trained a network that obtains key points of the face by supervising the generation of heatmaps.The network uses the max operation to obtain 68 key point coordinates of the face from the key point heat map with 68 channels output by FCN. At present I want to combine this network with another network to train together, but the max operation used before is not differentiable, so I want to replace the max operation with dsnt.
So I use batch_location_dsnt = dsntnn.dsnt(heatmap) （the heatmap is obtained by FCN, it's a 68 * 1 * 16 * 16 tensor）
but the batch_location_dsnt I obtained is

`tensor([[[ -0.5989, -0.2222]],

    [[ -0.6683,  -0.0225]],

    [[ -0.7003,   0.1874]],

    [[ -0.7120,   0.5027]],

    [[ -0.6451,   0.7451]],

    [[ -0.5105,   1.0081]],

    [[ -0.4522,   1.1898]],

    [[ -0.2934,   1.2817]],

    [[ -0.0759,   0.9567]],

    [[  0.1304,   1.0607]],

    [[  0.3462,   1.4314]],

    [[  0.7308,   1.3509]],

    [[  0.8871,   1.0625]],

    [[  1.1645,   0.7980]],

    [[  1.4735,   0.5973]],

    [[  1.3658,   0.1797]],

    [[  1.2114,  -0.1012]],

    [[ -0.7434,  -0.7085]],

    [[ -0.6286,  -0.7392]],

    [[ -0.4630,  -0.7343]],

    [[ -0.2988,  -0.6485]],

    [[ -0.1515,  -0.5185]],

    [[  0.0185,  -0.5908]],

    [[  0.3039,  -0.6446]],

    [[  0.5553,  -0.6704]],

    [[  0.8032,  -0.6359]],

    [[  0.9848,  -0.4610]],

    [[ -0.1231,  -0.3595]],

    [[ -0.2189,  -0.2581]],

    [[ -0.2404,  -0.0784]],

    [[ -0.3306,   0.1073]],

    [[ -0.4281,   0.2564]],

    [[ -0.3071,   0.3424]],

    [[ -0.2748,   0.3945]],

    [[ -0.1277,   0.3686]],

    [[  0.0404,   0.3399]],

    [[ -0.5630,  -0.4150]],

    [[ -0.4809,  -0.4761]],

    [[ -0.3541,  -0.4953]],

    [[ -0.2261,  -0.3877]],

    [[ -0.4000,  -0.3473]],

    [[ -0.5188,  -0.3881]],

    [[  0.2428,  -0.3442]],

    [[  0.4070,  -0.3346]],

    [[  0.5273,  -0.3868]],

    [[  0.7190,  -0.2441]],

    [[  0.5536,  -0.2888]],

    [[  0.4207,  -0.2777]],

    [[ -0.3997,   0.7421]],

    [[ -0.3004,   0.5801]],

    [[ -0.3018,   0.5292]],

    [[ -0.1713,   0.4833]],

    [[ -0.0893,   0.4787]],

    [[  0.0906,   0.6432]],

    [[  0.3095,   0.7009]],

    [[  0.1567,   0.8734]],

    [[ -0.0456,   1.1209]],

    [[ -0.1621,   1.0680]],

    [[ -0.2678,   1.0100]],

    [[ -0.3905,   0.8635]],

    [[ -0.3840,   0.7459]],

    [[ -0.2615,   0.6243]],

    [[ -0.1569,   0.5345]],

    [[ -0.1064,   0.6030]],

    [[  0.2071,   0.6364]],

    [[ -0.0748,   0.8947]],

    [[ -0.1838,   0.7509]],

    [[ -0.2617,   0.8739]]], device='cuda:0', grad_fn=<CatBackward>)`

Obviously，[-0.5989, -0.2222] doesn't look like coordinates，Why is dsnt not outputting the maximum x and y coordinates like the max operation? How can I get the correct coordinates of the key points?

Suggestion: Define X,Y grid so that they include -1 and 1

I have read the paper and was wondering if there is a fix for the problem stated on page 8:

Analysis of misclassified examples revealed that DSNT was less accurate for predicting edge case joints that lie very close to the image boundary, which is expected due to how the layer works

The reason seems to be that the X and Y grid is defined to lie in the range (-1,1) by the formulas on page 4. Is there a specific reason for this or would the DSNT also work when the grids are in the range [-1,1]?

A formula to define such a grid would be

-1 + (2*(i-1)) / (w-1)

For a heatmap that has the width 5, the grid would have these values in the columns:

i=1 => -1
i=2 => -1 + 2/4 = -0.5
i=3 => -1 + 4/4 = 0
i=4 => -1 + 6/4 = 0.5
i=5 => -1 + 8/4 = 1

So the grid would look like
-1 | -0.5 | 0 | 0.5 | 1

instead of
-0.8 | -0.4 | 0 | 0.4 | 0.8

So my question is if there is a reason to use the second grid instead of the first one? From what I see this should also work. If there is interest in this change, I could try to implement it.

The advantage would be that the system will be able to regress coordinates on the border and not just very close to the border (depending on the heatmap dimensions)

Problems when add dsntnn into HourglassNet.

Hi @anibali !
Thanks to the your concise code, it's real very convenient to add your dsntnn module into a Hourglass Network. But when I try to do this and train the Hourglass Network with MPII dataset, it seems not converge well.

The way I add dsntnn module into Hourglass Network ：

class HourglassDsntNet(nn.Module):
  def __init__(self, nStack, nModules, nFeats, nRegModules):
    super(HourglassDsntNet, self).__init__()
    self.nStack = nStack
    self.nModules = nModules
    self.nFeats = nFeats
    self.nRegModules = nRegModules
    self.conv1_ = nn.Conv2d(3, 64, bias=True, kernel_size=7, stride=2, padding=3)
    self.bn1 = nn.BatchNorm2d(64)
    self.relu = nn.ReLU(inplace=True)
    self.r1 = Residual(64, 128)
    self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
    self.r4 = Residual(128, 128)
    self.r5 = Residual(128, self.nFeats)

    _hourglass, _Residual, _lin_, _tmpOut, _ll_, _tmpOut_, _reg_ = [], [], [], [], [], [], []
    for i in range(self.nStack):
      _hourglass.append(Hourglass(4, self.nModules, self.nFeats))
      for j in range(self.nModules):
        _Residual.append(Residual(self.nFeats, self.nFeats))
      lin = nn.Sequential(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1),
                          nn.BatchNorm2d(self.nFeats), self.relu)
      _lin_.append(lin)
      _tmpOut.append(nn.Conv2d(self.nFeats, ref.nJoints, bias=True, kernel_size=1, stride=1))
      _ll_.append(nn.Conv2d(self.nFeats, self.nFeats, bias=True, kernel_size=1, stride=1))
      _tmpOut_.append(nn.Conv2d(ref.nJoints, self.nFeats, bias=True, kernel_size=1, stride=1))

    self.hourglass = nn.ModuleList(_hourglass)
    self.Residual = nn.ModuleList(_Residual)
    self.lin_ = nn.ModuleList(_lin_)
    self.tmpOut = nn.ModuleList(_tmpOut)
    self.ll_ = nn.ModuleList(_ll_)
    self.tmpOut_ = nn.ModuleList(_tmpOut_)

  def forward(self, x):
    x = self.conv1_(x)
    x = self.bn1(x)
    x = self.relu(x)
    x = self.r1(x)
    x = self.maxpool(x)
    x = self.r4(x)
    x = self.r5(x)

    outMap = []
    outReg = []

    for i in range(self.nStack):
      hg = self.hourglass[i](x)
      ll = hg
      for j in range(self.nModules):
        ll = self.Residual[i * self.nModules + j](ll)
      ll = self.lin_[i](ll)
      tmpOutMap = self.tmpOut[i](ll)
      heatmaps = dsntnn.flat_softmax(tmpOutMap)
      outMap.append(tmpOutMap)
      tmpOutReg = dsntnn.dsnt(heatmaps)
      outReg.append(tmpOutReg)

      ll_ = self.ll_[i](ll)
      tmpOut_ = self.tmpOut_[i](tmpOutMap)
      x = x + ll_ + tmpOut_

    return outMap, outReg

the way I do the train procedure ：

    for i, (input, target2D, target3D, meta) in enumerate(dataLoader):
        input_var = torch.autograd.Variable(input).float().cuda()
        target2D_var = torch.autograd.Variable(target2D).float().cuda()
        target3D_var = torch.autograd.Variable(target3D).float().cuda()

        out_map, out_reg = model(input_var)
        # filter the joint without annotation
        filter = target3D_var[:, :, 2].unsqueeze(dim=2)
        out_reg[0] = out_reg[0] * filter
        out_reg[1] = out_reg[1] * filter
        
        loss_map = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        loss_reg = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        loss = torch.autograd.Variable(torch.FloatTensor([0])).float().cuda()
        for k in range(opt.nStack):
            # Per-location euclidean losses
            euc_losses = dsntnn.euclidean_losses(out_reg[k], target3D_var[:, :, :2])
            # Per-location regularization losses
            reg_losses = dsntnn.js_reg_losses(out_map[k], target3D_var[:, :, :2], sigma_t=1.0)
            # Combine losses into an overall loss
            loss += dsntnn.average_loss(euc_losses + reg_losses)
            loss_map += euc_losses
            loss_reg += reg_losses

        if split == 'train':
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

I only try to train the network for five epochs, and the results shows that it seems not going to converge at all.

All the other experiment settings work fine with the pure Hourglass Network.
So is there any tricks I should add in my code? or I just add your module in an incorrect way?

It seems that you did some experiments with Hourglass in your paper, could you offer any help?

Use generated 2d-guassion heatmap as the regularization.

I took a look at the paper, and curious about the regularization part, such as KL, JS..etc. I am wondering if it is possible to use 2d-guassion heatmap generated from ground truth as the regularization. What do you think of that?

Question re. occluded or missing points in training data

When training a model that outputs a heatmap, training data with missing or occluded points are easily handled by setting the target output to be a heatmap of all zeros. With the DSNT layer, how can this be handled? An all-zero heatmap corresponds corresponds to target coordinates of [0,0] here, but so does a point very well localized in the center of the image.

how to get the confidence score from the output result?

If i use your module to predict keypoint , the output heatmap is not same as the other method, which i means the peak value is soo small . so do u know how to get a resonable confidence score ?

Values outside (-1,1)

Hi,

I have a question about the assumption that the DSNT layer always outputs values in (-1,1) as stated here:

https://github.com/anibali/dsntnn/blob/master/examples/basic_usage.md

Importantly, the target coordinates are normalized so that they are in the range (-1, 1). The DSNT layer always outputs coordinates in this range.

Especially in the first epoch, I sometimes get values that are a little bit outside of this range, e.g. -1.0224

I first thought it is because I forgot to normalize the heatmaps, but I did not:

heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)
coords = dsntnn.dsnt(heatmaps)

Is this maybe because of numerical instability? I read the paper and got the idea and if I understand it correctly, the heatmap is interpreted as a probability and then the x- and y- coordinates area computed by "folding" the heatmap with the grid that is shown in the paper.

Best,
Simon

Get confidence of prediciton per regressed coordinate

Hi,

first of all: I really like the DSNT layer. It works perfectly and the idea is really cool :)

However, what would be really useful for my application, is a way to get the confidence of my model that the coordinate regressed by DSNT is correct. When looking at the heatmaps, the confidence should be very high when the heatmap values are close to 1 at the predicted position and close to 0 everywhere else. The confidence should be low when it there is a large patch in the heatmap that has a low confidence and then in the middle of this patch there is one point with a slightly higher confidence.

So I guess what I am looking for is a way to derive the standard deviation of a Gaussian which has its center at the position that is predicted by DSNT. And then I would have to transform the deviation to a confidence value between 0 and 1.

In the end I want to have n coordinates that have 3 values: x,y, confidence

Is there an easy way to do this with the functions already provided by DSNT?

Best,
Simon

Working with 3D

I'm working with 3D volumes, base model is ResNet3D50 (all convolutions are Conv3D) and wants to predict x, y, z co-ordinates. Can you please help with that? Thanks.

DSNT support only 1 point in 1 heatmap？

DSNT support only 1 point in 1 heatmap？
How to get poses of multi people in one image?

A question about the input and target

Do the input image and target heat map need to be normalized to the range (-1,1) before it can be used? Because I saw you write it:

The input and target need to be put into PyTorch tensors. Importantly, the target coordinates are normalized so that they are in the range (-1, 1). The DSNT layer always outputs coordinates in this range.

Is Frobenius computed correctly?

I look up your code at _coord_expectation function. I'm not fully understand why you sum heatmaps (Z) before inner product with own_coords (X or Y) (then you do sum it again). I thought it must be summed after inner product of heatmaps and own_coords? This really makes me confuse because in your paper, you describe in Figure 3 differently.

Pip install fails

Installing dsntnn with pip throws this error:

Could not find a version that satisfies the requirement dsntnn (from versions: )
No matching distribution found for dsntnn

My pip3 version: 10.0.1
Python version: 3.5.2
Os: Mac OS X 10.13.5 High Sierra

increase batch size 1 to 16, it made wrong result.

Hello

now i want to make landmark detection.
so i use dsntnn.
when i set batch size=1 , result is good
but
if batch size=16
result points are gathered in one place.

why is that? What should i do??

dsnt in testing phase

Thanks for the paper and code!

I am doing pose estimation and face a problem that the heatmap for predicting the left wrist also fires a little response on the right wrist, which means, the heatmap has two peaks, a strong peak on the left wrist and a weak peak on the right wrist.

The two peaks problem makes the dsnt predicted result uncorrect, do you have any suggestions? Thanks!

Converting to onnx

I would like to convert the model to ONNX format but, the operators flip and linspace are not supported. Is there a workaround for this?
Thanks

RuntimeError: expected flip dims axis >= 0, but got min flip dims=-1

My torch version is 0.4.1, an error occurs when I ran basic usage guide.

I had tried to pip install another dsntnn version described at #6, looking for your reply as soon as possible.

  File ".\model.py", line 71, in <module>
    coords, heatmaps = model(t)
  File "C:\Python35\lib\site-packages\torch\nn\modules\module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File ".\model.py", line 46, in forward
    coords = dsntnn.dsnt(heatmaps)
  File "C:\Python35\lib\site-packages\dsntnn\__init__.py", line 79, in dsnt
    return soft_argmax(heatmaps)
  File "C:\Python35\lib\site-packages\dsntnn\__init__.py", line 67, in soft_argmax
    return linear_expectation(heatmaps, values).flip(-1)
RuntimeError: expected flip dims axis >= 0, but got min flip dims=-1```

Trace warnings when trying to jit.trace a model

First of all, hats off for your effort on building and maintaining this. Keep up the good work.

My issue is when I try to jit.trace a model that uses this layer, I get an error similar to this one,

dsntnn.py:47: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
return torch.linspace(first, last, length, device=device)

This also happens when trying to export the onnx model from a model that uses dsntnn, so basically a model that we try to export to onnx with a command like this, will give this trace warning, making it impossible to load the exported model.

torch.onnx.export(model, x, "deployment/ckpts/{0}.onnx".format(model_name), export_params=False, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)

How to reproduce,

model = CoordRegressionNetwork(n_locations=2)
x = Variable(torch.randn(5, 3, 200, 200, requires_grad=True))
traced_script_module = torch.jit.trace(model, x)