nianticlabs / ace Goto Github PK

View Code? Open in Web Editor NEW

346.0 12.0 34.0 19.94 MB

[CVPR 2023 - Highlight] Accelerated Coordinate Encoding (ACE): Learning to Relocalize in Minutes using RGB and Poses

Home Page: https://nianticlabs.github.io/ace

License: Other

Python 57.55% C++ 38.76% Shell 3.69%

camera-localization cvpr cvpr2023 dsac compute-vision machine-learning pose-estimation visual-localization

ace's People

Contributors

Stargazers

Watchers

ace's Issues

About 7_scenes rgb calibration

Hi, your work is great.
I notice that you obtain camera poses of 7scenes in setup_7scenes.py via

cam_pose = np.matmul(cam_pose, np.linalg.inv(d_to_rgb))

Why not

cam_pose = np.matmul(d_to_rgb, cam_pose)

And I found there is a line

transform from depth sensor to RGB sensor
eye_coords = np.matmul(d_to_rgb, eye_coords)

Can you help me to understand it? Thank you!

3D models for the wayspot dataset

Hi,
Is it possible that you can share the SfM models for each of the scenes in the wayspot dataset? Many thanks.

Cannot download encoder weights

Hi,

I tried to pull the code and got this error. Is it possible to upload the encoder weight somewhere else? THanks.

Downloading ace_encoder_pretrained.pt (22 MB)
Error downloading object: ace_encoder_pretrained.pt (c69ca93): Smudge error: Error downloading ace_encoder_pretrained.pt (c69ca934d82056c9f2e9be9582d85841bd53355ac04a007c11618b1760c00a8c): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

Wayspot Datasets Visualization

Hi, I tried to reproduce with your Wayspot dataset. For statue dataset visualization, I still can't understand why it shows like this? Should I rotate your dataset first or what?

Hope to hear from you soon! Thanks!

fx and fy in test_ace.py

Can the following line be removed since different fx and fy is supported after #15?

ace/test_ace.py

Line 213 in 6b2a3bf

assert torch.allclose(intrinsics_33[0, 0], intrinsics_33[1, 1])

Confusion about ACE coordinate frame

Hi! I'm coming back again because I tried with my new dataset and I feel so confused when I tried with this data. I'm aware if ACE use GT pose in form 4x4 camera-to-world form. But here is the problem.

Suppose my robot system is like the picture below. When I perform the transformation between the lidar frame to camera frame, I assume the transformed 6DOF of Lidar(to-camera) will fulfill the requirement C2W matrix.
With the assumption of 6DOF of camera frame (in world coordinate), I make that as my GT of my images dataset. I passed it to the ACE but the result is quite strange since the way camera move in x axis is mirrored. (please see the picture below)
If I multiplied the x value of my GT by -1, ACE will follows my trajectory in world coordinate well.

Could you elaborate this? Does the problem happened since the 1st step? Thank you so much!

Encoder Training

When will the code for the training encoder be available？

fisheye camera case

Thank you for this great relocalization framework!
I found out that the demo results with the dataset you provided are awesome in my machine.

And I have a simple question while trying to apply this framework on my project.
Could it be applied to fisheye camera case if we have correct extrinsic camera poses?
I guess the reprojection error need to be changed to appropriate form considering the lens distortion..
Could I get some advices?

Thank you in advance.

Windows

Any chance for windows users?

About _convert_cv_to_gl(pose)

Hi, I wonder why convert pose in opencv convention to opengl via using this matrix.

 @staticmethod
    def _convert_cv_to_gl(pose):
        """
        Convert a pose from OpenCV to OpenGL convention (and vice versa).

        @param pose: 4x4 camera pose.
        @return: 4x4 camera pose.
        """
        gl_to_cv = np.array([[1, -1, -1, 1], [-1, 1, 1, -1], [-1, 1, 1, -1], [1, 1, 1, 1]])
        return gl_to_cv * pose

On other places, you just apply

        scene_coordinates[:, 1] = -scene_coordinates[:, 1]
        scene_coordinates[:, 2] = -scene_coordinates[:, 2]

Thanks!

Consistency in pose format for train/validation

It seems to me that the training data wants poses in a 4x4 rotation/translation matrix and the evaluation wants the pose rotation in a quaternion format. For the sake of consistency, would it make sense for these two formats to be aligned?

support fx and fy

Hi,
Thank you for the amazing paper and for sharing the code.
From this, I saw the code is using single focal length f.
Currently, my colmap result returned fx & fy.
I would like to ask if is it ok to run with only fx from your code.
Or can you support the fx and fy?

Where does ACE freezes the backbone when training the head

Hi,

I can't seem to find where does ACE freeze the backbone during training, since during test it use the same pre-trained encoder, during training it's probably better to freeze the backbone such that the backbone doesn't change, right?

or is this on purpose to not freeze the backbone?

Question with regards to results on Indoor6 dataset

Hi, thank you for great work.
I try to use ACE on the Indoor6 dataset provided by the following: https://github.com/microsoft/SceneLandmarkLocalization. However, the results are not so good, the translation error reach 1 meter and the rotation error can be up to 100 degrees. Because indoor6 dataset is collected at different time and day, it contains high illumination variations. Could ACE work in such cases, are there any configuration that i missed?

render_visualization error

Hi I am getting this error after setting --render_visualization=True, any idea about that?
INFO:OpenGL.platform.ctypesloader:Failed to load library ( 'libEGL.so.0' ): libEGL.so.0: cannot open shared object file: No such file or directory
WARNING:ace_visualizer:Rendering failed, trying again!

Failed to Load Encoder

Hi, I tried to reproduce with the 7Scene chess scene but I got this error

INFO:ace_trainer:Loaded training scan from: datasets/7scenes_chess -- 4000 images, mean: 0.45 -0.74 0.30 Traceback (most recent call last): File "train_ace.py", line 126, in <module> trainer = TrainerACE(options) File "/home/nanda/dev/ace/ace_trainer.py", line 92, in __init__ encoder_state_dict = torch.load(self.options.encoder_path, map_location="cpu") File "/home/nanda/.conda/envs/ace/lib/python3.8/site-packages/torch/serialization.py", line 815, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/home/nanda/.conda/envs/ace/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

Seems like I can't load the pre-trained encoder that you already shared. Do you have any idea about this? Thanks!

Support in MacOS

Is there any support to build dsacstar in macos?
How can I specify opencv_inc_dir if using own python environment? Is it c++ opencv or python opencv? Thanks

Question about sensitivity to ground truth pose accuracy

Hi,

We have been playing around with this approach trying to train models from data supplied by ArKit on iphones. Unfortunately, we have not been able to achieve good results by trying to localize 20 seconds worth of frames against 20 seconds worth of mapping frames (around 30fps). The data is recorded by physically aligning the phone from the same place. The median rotation error is usually above 30 degrees and the translation error is around 50-100 cm while indoors in a small room.

I'm wondering if you have any speculation on what the problem might be. It's clear that the poses provided by ArKit are not fully accurate as they are running their own black box SLAM system. However, they are generally consistent over time. Is it really possible that a few percentage points of error on the ground truth can blow up the model error by this much?

Thank you so much

Encoder Training

Whether there are plans to publish the training encoder？ And when will the code for the training encoder be available？

Running inference on mobile

I was wondering if there would be any technical challenges that would make running inference on mobile impossible. The models themselves seem small but I'm not familiar enough to know if there are any other challenges/considerations.

Update 1

Currently I am on the following error when trying to convert the model to torchscript. Seems like it would be easy enough to fix but there would probably be more things that don't work afterwards as well.

Module 'Head' has no attribute 'res_blocks' (This attribute exists on the Python module, but we failed to convert Python type: 'list' to a TorchScript type. Could not infer type of list element: Cannot infer concrete type of torch.nn.Module. Its type was inferred; try adding a type annotation for the attribute.):
  File "/home/powerhorse/Desktop/daniel_tmp/benchmark/anchor/third_party/ace/ace_network.py", line 128
        res = self.head_skip(res) + x
    
        for res_block in self.res_blocks:
                         ~~~~~~~~~~~~~~~ <--- HERE
            x = F.relu(res_block[0](res))
            x = F.relu(res_block[1](x))

Pretrain Feature Extractor Backbone Network from ScanNet

Thank you very much for the code on Git Hub. In paper 3.3 Backbone Training, you said "Instead of training the backbone with one regression head for a single scene, we train it with N regression heads for N scenes, in parallel." I'd like to know exactly how many scenarios this N represents using for parallel training. In addition to that, would you mind sharing how to select 100 scenarios in ScanNet? Very much looking forward to your reply.

generate reconstructions

Thanks for your work. Just wondering if it's possible to generate the point cloud of the scene with the trained weights as shown in the demo video?

Thanks

Does ACE not support large dataset?

After reproducing ACE with 7scenes, cambridge, and my own dataset, I found out the translation and the rotational error quite good. But the problem is, when I visualize the result (I set the --render_visualization to True) to my own dataset that quite large rather than cambridge dataset, the result quite strange.

My own dataset ( A big building)

The test result

The mapping result of ACE

Is it safe to say if ACE doesn't support a large dataset? Because the mapping in 7scenes and Cambridge dataset are pretty good and I can see the details. I'm just curious. Hope to hear from you soon!

Is there a pipline for large scale scene like kitti odometry?

Inference code

Hello,
I would like to run the model to estimate the camera poses only after I trained the model with my customs dataset.
Which means I don't have the GT camera poses.
If I want to do that, I have to modify CamLocDataset to disable reading the pose_dir at https://github.com/nianticlabs/ace/blob/main/dataset.py#L123 and modify it to return pose as None in this https://github.com/nianticlabs/ace/blob/main/dataset.py#L496
And also, turn off evaluation from here https://github.com/nianticlabs/ace/blob/main/test_ace.py#L236
Would you mind checking for me?

Custom Dataset.

Thank you for releasing the code. Can you please guide, how to generate Output map file on custom dataset? I tried running the code on my custom dataset and it is showing following error. Once again thank you.

serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'

Need some clarification on what pose file mean

Hi,
I am trying to generate some training data directly to test ACE out, one question is about the pose file,
are poses under train/poses camera_to_world transform or world_to_camera transform?
given the definition of A_to_B transform means transforming a point from coordinate A to coordinate B
e.g.

point_B = A_to_B_transform * point_A

Thanks!

Support principle point intrinsic parameters in training

    # Create the intrinsics matrix.
    intrinsics = torch.eye(3)
    intrinsics[0, 0] = focal_length
    intrinsics[1, 1] = focal_length
    # Hardcode the principal point to the centre of the image.
    intrinsics[0, 2] = image.shape[2] / 2
    intrinsics[1, 2] = image.shape[1] / 2

ace/dataset.py

Line 489 in 2507cdb

# Hardcode the principal point to the centre of the image.

Assuming I'm understanding this correctly, would be happy to make a PR for this. We are looking to train from Arkit IOS data where the optical center can vary from frame-to-frame as a result of some distortion correction that apple seems to be doing.

Reprojection Error

Could you please explain to me what the reprojection error shows on the right side of the video during mapping?

Peformance vs hloc

1、I found it is low accuracy in low translation error vs hloc in my datasets， encoder net work down samples with 8，and that means every feature represent 8*8 grid，and target pixel position is the center of this grid，this position is not accuracy？
2、did you use unet network as encoder network for improving performance？and how can I train encoder network from scratch。very nice work and thanks a lot

nianticlabs / ace Goto Github PK

ace's People

Contributors

Stargazers

Watchers

Forkers

ace's Issues

Update 1

Recommend Projects

Recommend Topics

Recommend Org