Giter Club home page Giter Club logo

nemo's Introduction

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]

Release Notes

The offical PyTorch implementation of NeMo, published on ICLR 2021. NeMo achieves robust 3D pose estimation method by performing render-and-compare on the level of neural network features. Example figure The figure shows a dynamic example of the pose optimization process of NeMo. Top-left: the input image; Top-right: A mesh superimposed on the input image in the predicted 3D pose. Bottom-left: The occluder location as predicted by NeMo, where yellow is background, green is the non-occluded area and red is the occluded area of the object. Bottom-right: The loss landscape as a function of each camera parameter respectively. The colored vertical lines demonstrate the current prediction and the ground-truth parameter is at center of x-axis.

Installation

The code is tested with python 3.7, PyTorch 1.5 and PyTorch3D 0.2.0.

Clone the project and install requirements

git clone https://github.com/Angtian/NeMo.git
cd NeMo
pip install -r requirements.txt

Running NeMo

We provide the scripts to train NeMo and to perform inference with NeMo on Pascal3D+ and the Occluded Pascal3D+ datasets. For more details about the OccludedPascal3D+ please refer to this Github repo: OccludedPASCAL3D.

Step 1: Prepare Datasets
Set ENABLE_OCCLUDED to "true" if you need evaluate NeMo under partial occlusions. You can change the path to the datasets in the file PrepareData.sh, after downloading the data. Otherwise this script will automatically download datasets.
Note: the default behavior of overwrite when generate 3D annotation is set to false, which means it will skips those existing annotation files. To regenerate those annotation files (such as fix corrupted data), the overwrite option need to be set as true.
Then run the following commands:

chmod +x PrepareData.sh
./PrepareData.sh

Step 2: Training NeMo
Modify the settings in TrainNeMo.sh.
GPUS: set avaliable GPUs for training depending on your machine. The standard setting uses 7 gpus (6 for the backbone, 1 for the feature bank). If you have only 4 GPUs available, we suggest to turn off the "--sperate_bank" in training stage.
MESH_DIMENSIONS: "single" or "multi".
TOTAL_EPOCHS: The default setting is 800 epochs, which takes 3 to 4 days to train on an 8 GPUs machine. However, 400 training epochs could already yield good accuracy. The final performance for the raw Pascal3D+ over train epochs (SingleCuboid):

Training Epochs 200 400 600 800
Acc Pi / 6 82.4 84.4 84.8 85.5
Acc Pi / 18 57.1 59.2 59.6 60.2

Then, run these commands:

chmod +x TrainNeMo.sh
./TrainNeMo.sh

Step 2 (Alternative): Download Pretrained Model
Here we provide the pretrained NeMo Model and backbone for the "SingleCuboid" and "MultiCuboid" setting. Run the following commands to download the pretrained model (SingleCuboid):

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1X1NCx22TFGJs108TqDgaPqrrKlExZGP-' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1X1NCx22TFGJs108TqDgaPqrrKlExZGP-" -O NeMo_Single_799.zip
unzip NeMo_Single_799.zip

Download the pretrained model (MultiCuboid):

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1e6bA6hpFEqZC59qsdl9otkkSHR1SYszz' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1e6bA6hpFEqZC59qsdl9otkkSHR1SYszz" -O NeMo_Multi_799.zip
unzip NeMo_Multi_799.zip

The expected performance for the pretrained model (Unoccluded):

SingleCuboid plane bike boat bottle bus car chair table mbike sofa train tv Mean
Pi/6 81.1 79.8 69.0 87.6 88.2 98.5 86.2 73.9 83.7 95.4 81.3 82.0 85.5
Pi/18 47.8 27.9 35.9 49.0 84.4 94.9 47.3 51.4 31.0 59.6 68.1 42.4 60.2
MedErr 10.6 16.8 15.4 10.3 3.2 3.2 10.5 9.7 14.6 8.5 5.4 12.0 8.9
MultiCuboid plane bike boat bottle bus car chair table mbike sofa train tv Mean
Pi/6 78.1 81.2 64.9 88.2 94.4 98.6 87.4 81.5 84.2 96.0 90.6 85.9 87.0
Pi/18 38.8 35.0 33.1 50.2 91.4 95.8 48.5 58.2 36.2 65.4 77.9 54.5 62.8
MedErr 13.0 14.1 16.4 10.0 2.6 2.9 10.4 8.3 13.9 7.3 4.7 9.0 8.5

Step 3: Inference with NeMo
The inference stage includes feature extraction and pose optimization. The pose optimization conducts render-and-compare on the neural features w.r.t. the camera pose iteratively. This takes some time to run on the full dataset (3-4 hours for each occlusion level on a 8 GPU machine).
To run the inference, you need to first change the settings in InferenceNeMo.sh:
MESH_DIMENSIONS: Set to be same as the training stage.
GPUS: Our implemention could either utilize 4 or 8 GPUs for the pose optimization. We will automatically distribute workloads over available GPUs and run the optimization in parallel.
LOAD_FILE_NAME: Change this setting if you do not train 800 epochs, e.g. train NeMo for 400 -> "saved_model_%s_399.pth".

Then, run these commands to conduct NeMo inference on unoccluded Pascal3D+:

chmod +x InferenceNeMo.sh
./InferenceNeMo.sh

To conduct inference on the occluded-Pascal3D+ (Note you need enable to create OccludedPascal3D+ dataset during data preparation):

./InferenceNeMo.sh FGL1_BGL1
./InferenceNeMo.sh FGL2_BGL2
./InferenceNeMo.sh FGL3_BGL3

Inference On Unlabeled datasets

Note: our approach can slightly adept to 6D pose estimation. This ability relief the requirement of bounding box, and allow the input image to have a roughly centered object and roughly fixed scale, but the requirement is not very strict.

We further provide a script to conduct NeMo inference on unlabeled datasets to generator pose predictions. To run the code (code/PredUnlabeledDataset.py):
Firstly, download predtrained weight and the CAD model.
Secondly, mannualy adept the crop function to make object in the cropped image centered and have similiar size as PASCAL3D+ dataset. (The default settings is for the Comprehensive Car).
Finally, run the code. Predictions will be store in the same direction ('final_pred.npz'). The format is (distance_pred, theta_pred, elevation_pred, azimuth_pred, translation_vertical, translation_horizontal).
We also provide scripts to visualize the preditions (tools/VisualizeUnlabeledDataset.py)

Visualizations of prediction on Comprehensive Car (we use the Single Cuboid predtrained model on PASCAL3D+ car category, we observe around 60-70% of the final predition are reasonably accurate). Comprehensive Car figure

Citation

Please cite the following paper if you find this the code useful for your research/projects.

@inproceedings{wang2020NeMo,
title = {NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation},
author = {Angtian, Wang and Kortylewski, Adam and Yuille, Alan},
booktitle = {Proceedings International Conference on Learning Representations (ICLR)},
year = {2021},
}

nemo's People

Contributors

adamkortylewski avatar angtian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nemo's Issues

Split of L0-Occluded Pascal3D+

Hi,

Thanks for the great work and sharing the code!

I've got one question about the non-occluded split of Pascal3D+. In your paper, StarMap gets 89% Acc30 while the author of StarMap only reported an accuracy at 82% in their paper, I suppose this difference is resulted from a different dataset split for evaluation. Could you tell me how you get the non-occluded test images on Pascal3D+ and is there any difference between the images in occlusion L0, L1, L2, and L3?

Training on KITTI3D

Hi Angtian,

As the title suggests, I was just wondering whether you think it's possible and whether you have tried to train NeMo on KITTI3D considering that the samples contain ground truth only for azimuth and not for elevation or theta.

Ground-truth camera pose is not used during training Neural Mesh Models

Hello,

In the paper, while training the Neural Mesh Models, "the correspondence between the feature vector f_i in the feature map F and the vector θ_r on the neural mesh model is given by the 2D projection of the mesh with camera parameters m". I assume that this means during training, the neural mesh model is rendered using the ground-truth camera pose m, then compared with the features of the RGB image.

However, I don't see this being the case in the code. Can you give some comments about the actual implementation?

Thanks in advance.

"Include method supposed to be point or bbox" error running PrepareData.sh with BboxTools 1.0.2

if_visible = np.logical_and(if_visible, box_ori.include(points_2d))

BBox2D.include in BboxTools 1.0.2 seems to only accept single points or a Bbox2D object (code pasted below). points_2d is a numpy array of points, which causes an error at the line above, so generate3Dpascal3D.py is failing to generate any annotations.

def include(self, other):
        """
        Check if other is inside this box. Notice include means strictly include, other could not place at the boundary
        of this bbox.
        :param other: (Bbox2D or tuple of int) bbox or point
        :return: (bool) True or False
        """
        if type(other) == Bbox2D:
            out = True
            for i in range(2):
                if self.bbox[i][0] > other.bbox[i][0]:
                    out = False
                if self.bbox[i][1] < other.bbox[i][1]:
                    out = False
            return out

        if type(other) == tuple and len(other) == 2:
            if other[0] < self.bbox[0][0] or other[0] >= self.bbox[0][1]:
                return False
            if other[1] < self.bbox[1][0] or other[1] >= self.bbox[1][1]:
                return False
            return True
        raise Exception('Include method suppose to be point or bbox, but got %s' % str(other))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.