Giter Club home page Giter Club logo

semantic-ray's Introduction

Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

Official implementation of Semantic-Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention.

The paper has been accepted by CVPR 2023 ๐Ÿ”ฅ.

Introduction

We propose a generalizable semantic field named Semantic Ray, which is able to learn from multiple scenes and generalize to unseen scenes. Different from Semantic NeRF which relies on positional encoding thereby limited to the specific single scene, we design a Cross-Reprojection Attention module to fully exploit semantic information from multiple reprojections of the ray. In order to collect dense connections of reprojected rays in an efficient manner, we decompose the problem into consecutive intra-view radial and cross-view sparse attentions, so that we extract informative features at small computational costs. Experiments on both synthetic and real scene data demonstrate the strong generalization ability of our S-Ray. We have also conducted extensive ablation studies to further show the effectiveness of our proposed Cross-Reprojection Attention module. With the generalizable semantic field, we believe that S-Ray will encourage more explorations of potential NeRF-based high-level vision problems in the future.

Installation

The code can be tested with Python3.10, PyTorch 2.0 and CUDA 11.7. We recommend you to use anaconda to create a new environment and install the dependencies:

conda create -n semray python=3.10
conda activate semray
conda install pytorch=2.0 torchvision=0.15 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Datasets

ScanNet

Download the ScanNet dataset from here and extract the color images, depth images, labels, poses and intrinsics of each scene. Organize the data in the following structure:

โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ scannet
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_00
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ color
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 0.jpg
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ depth
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 0.png
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ label-filt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 0.png
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ pose
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ 0.txt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ intrinsic
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ extrinsic_color.txt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ intrinsic_color.txt
โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚   โ”‚   โ”œโ”€โ”€ scannetv2-labels.combined.tsv

Replica

  • TODO

Training

(Optional) For better and faster reconstruction results, you can leverage the pretrained model of NeuRay, which can be downloaded from here. Put the pretrained model in data/model. If you want to train from scratch, you can skip this step and comment out the load_pretrain option in the config file.

To train Semantic-Ray with ScanNet, run:

CUDA_VISIBLE_DEVICES=0 python run_training.py --cfg configs/cra/train_cra_scannet.yaml

Evaluation

To evaluate the trained model, run:

CUDA_VISIBLE_DEVICES=0 python run_evaluation.py --cfg configs/cra/test_cra_scannet.yaml --model-path data/model/train_cra_scannet/model_best.pth

Fine-tuning

To fine-tune the trained model on a specific scene, create a new config file following the format of configs/cra/ft_cra_scannet_scene0376.txt. Then run:

CUDA_VISIBLE_DEVICES=0 python run_training.py --cfg configs/cra/ft_cra_scannet_scene0376.yaml

Acknowledgement

This repo benefits from NeuRay, IBRNet, Semantic-NeRF, and NeRF-pytorch. Thanks for their wonderful works.

Citation

If you found this work to be useful in your own research, please consider citing the following:

@inproceedings{liu2023semantic,
  author = {Liu, Fangfu and Zhang, Chubin and Zheng, Yu and Duan, Yueqi},
  title = {Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Contact

If you have any question about this project, please feel free to contact [email protected] or [email protected].

semantic-ray's People

Contributors

liuff19 avatar linshan-bin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.