Giter Club home page Giter Club logo

egocol's Introduction

EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization

Cristhian Forigua, Maria Escobar, Jordi Pont-Tuset, Kevis-Kokitsi Maninis, Pablo Arbeláez
Center for Research and Formation in Artificial Intelligence .(CINFONIA), Universidad de los Andes, Bogotá 111711, Colombia.

[arXiv]

We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization. Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision. We extensively evaluate our method on the Visual Query (VQ) 3D object localization Ego4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than the Ego4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR 2023 in the val and test sets, respectively.


Installation instructions

  1. Please follow the installation instructions from the Ego4D Episodic Memory repository.
  2. You need to install COLMAP to compute the reconstructions. Please follow these instructions to install it.
  3. Finally, you need to install the Open3D library. Follow these instructions to install it.

Data

Please follow the instructions from the Ego4D Episodic Memory repository to download the VQ3D data here.

Run EgoCOL

First, you need to compute the initial PnP camera poses by using the camera pose estimatio workflow proposed by Ego4D. Follow these instructions to compute them.

Once you have computed the initial camera poses you can use colmap to create the sparse reconstrutions using both the video and clip configurations:

$ cd colmap
$ python run_registrations.py --input_poses_dir {PATH_CLIPS_CAMERA_POSES} \
 --clips_dir {PATH_CLIPS_FRAMES} --output_dir {OUTPUT_PATH_COLMAP}

Similarly, run must run the registration for the scan configuration:

$ python run_registrations_by_scans.py --input_poses_dir {PATH_CLIPS_CAMERA_POSES} \
--clips_dir {PATH_CLIPS_FRAMES} --output_dir {OUTPUT_PATH_COLMAP_SCAN} --camera_intrinsics_filename {PATH_TO_INTRINSICS} --query_filename {PATH_TO_QUERY_ANNOT_FILE}

You get the folders {PATH_CLIPS_CAMERA_POSES}, {PATH_CLIPS_FRAMES}, {PATH_TO_INTRINSICS} and {PATH_TO_QUERY_ANNOT_FILE} by running the camera pose estimation worflow proposed by Ego4D. You can use the defaul value of each argument in the .py files to help you locate the right paths.

Then, you can compute the procrustes transformation between the PnP and sparse points by running the next lines. Make sure to change the paths for the "--annotations_dir", "--input_dir_colmap" and "--clips_dir" flags before you run the code.

$ python extract_dict_from_colmap.py
$ python extract_dict_from_colmap_by_scans.py

Then run the following lines:

$ python transform_ext.py --constrain --filter
$ python transform_ext_by_scan.py --constrain --filter

Make sure to change the paths for the flags. Also change the paths in lines 341 and 370 for the transform_ext.py and the lines 286, 207 and 369 for the transform_ext_by_scan.py. The filter and constrain flags are to apply 3D constrain

Evaluate

Center scan

To evaluate our method you can use the center of the scan as:

  1. Compute Ground-Truth vector in query frame coordinate system for queries with pose estimated.
$ python scripts/prepare_ground_truth_for_queries.py --input_dir {PATH_CLIPS_CAMERA_POSES} --vq3d_queries {VQ3D_QUERIES_ANNOT_JSON_FILE} --output_filename {OUTPUT_JSON_FILE} --vq2d_queries {VQ2D_QUERIES_ANNOT_JSON_FILE} --check_colmap
  1. Compute 3D vector predictions
$ python3 scripts/run.py --input_dir {PATH_CLIPS_CAMERA_POSES} --output_filename {OUTPUT_RUN_JSON_FILE} --vq2d_results {VQ2D_RESULTS_JSON_FILE} --vq2d_annot {VQ2D_ANNOT_JSON_FILE} --vq2d_queries {VQ2D_QUERIES_ANNOT_JSON_FILE} --vq3d_queries {OUTPUT_JSON_FILE} --check_colmap --constrain --baseline_center
  1. Run evaluation
$ python scripts/eval.py --vq3d_results {OUTPUT_RUN_JSON_FILE}

License and Acknowledgement

This project borrows heavily from Episodic Memory Ego4D Repository, we thank the authors for their contributions to the community.

Contact

If you have any question, please email [email protected]

egocol's People

Contributors

cdforigua05 avatar

Watchers

Pablo Arbeláez avatar Jensen Wang avatar

Forkers

flyinggh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.