Giter Club home page Giter Club logo

torch-ngp's Introduction

torch-ngp

This repository contains:

News: A clean and improved version focusing on static NeRF reconstruction of realistic scenes has been separated into nerf_template, as this repository has been hard to maintain.

Instant-ngp interactive training/rendering on lego:

nerf.mp4

Also the first interactive deformable-nerf implementation:

dnerf.mp4

Other related projects

Install

git clone --recursive https://github.com/ashawkey/torch-ngp.git
cd torch-ngp

Install with pip

pip install -r requirements.txt

# (optional) install the tcnn backbone
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

Install with conda

conda env create -f environment.yml
conda activate torch-ngp

Build extension (optional)

By default, we use load to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the setup.py to build each extension:

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
cd raymarching
python setup.py build_ext --inplace # build ext only, do not install (only can be used in the parent directory)
pip install . # install to python path (you still need the raymarching/ folder, since this only install the built extension.)

Tested environments

  • Ubuntu 20 with torch 1.10 & CUDA 11.3 on a TITAN RTX.
  • Ubuntu 16 with torch 1.8 & CUDA 10.1 on a V100.
  • Windows 10 with torch 1.11 & CUDA 11.3 on a RTX 3070.

Currently, --ff only supports GPUs with CUDA architecture >= 70. For GPUs with lower architecture, --tcnn can still be used, but the speed will be slower compared to more recent GPUs.

Usage

We use the same data format as instant-ngp, e.g., armadillo and fox. Please download and put them under ./data.

We also support self-captured dataset and converting other formats (e.g., LLFF, Tanks&Temples, Mip-NeRF 360) to the nerf-compatible format, with details in the following code block.

Supported datasets

First time running will take some time to compile the CUDA extensions.

### Instant-ngp NeRF
# train with different backbones (with slower pytorch ray marching)
# for the colmap dataset, the default dataset setting `--bound 2 --scale 0.33` is used.
python main_nerf.py data/fox --workspace trial_nerf # fp32 mode
python main_nerf.py data/fox --workspace trial_nerf --fp16 # fp16 mode (pytorch amp)
python main_nerf.py data/fox --workspace trial_nerf --fp16 --ff # fp16 mode + FFMLP (this repo's implementation)
python main_nerf.py data/fox --workspace trial_nerf --fp16 --tcnn # fp16 mode + official tinycudann's encoder & MLP

# use CUDA to accelerate ray marching (much more faster!)
python main_nerf.py data/fox --workspace trial_nerf --fp16 --cuda_ray # fp16 mode + cuda raymarching

# preload data into GPU, accelerate training but use more GPU memory.
python main_nerf.py data/fox --workspace trial_nerf --fp16 --preload

# one for all: -O means --fp16 --cuda_ray --preload, which usually gives the best results balanced on speed & performance.
python main_nerf.py data/fox --workspace trial_nerf -O

# test mode
python main_nerf.py data/fox --workspace trial_nerf -O --test

# construct an error_map for each image, and sample rays based on the training error (slow down training but get better performance with the same number of training steps)
python main_nerf.py data/fox --workspace trial_nerf -O --error_map

# use a background model (e.g., a sphere with radius = 32), can supress noises for real-world 360 dataset
python main_nerf.py data/firekeeper --workspace trial_nerf -O --bg_radius 32

# start a GUI for NeRF training & visualization
# always use with `--fp16 --cuda_ray` for an acceptable framerate!
python main_nerf.py data/fox --workspace trial_nerf -O --gui

# test mode for GUI
python main_nerf.py data/fox --workspace trial_nerf -O --gui --test

# for the blender dataset, you should add `--bound 1.0 --scale 0.8 --dt_gamma 0`
# --bound means the scene is assumed to be inside box[-bound, bound]
# --scale adjusts the camera locaction to make sure it falls inside the above bounding box. 
# --dt_gamma controls the adaptive ray marching speed, set to 0 turns it off.
python main_nerf.py data/nerf_synthetic/lego --workspace trial_nerf -O --bound 1.0 --scale 0.8 --dt_gamma 0
python main_nerf.py data/nerf_synthetic/lego --workspace trial_nerf -O --bound 1.0 --scale 0.8 --dt_gamma 0 --gui

# for the LLFF dataset, you should first convert it to nerf-compatible format:
python scripts/llff2nerf.py data/nerf_llff_data/fern # by default it use full-resolution images, and write `transforms.json` to the folder
python scripts/llff2nerf.py data/nerf_llff_data/fern --images images_4 --downscale 4 # if you prefer to use the low-resolution images
# then you can train as a colmap dataset (you'll need to tune the scale & bound if necessary):
python main_nerf.py data/nerf_llff_data/fern --workspace trial_nerf -O
python main_nerf.py data/nerf_llff_data/fern --workspace trial_nerf -O --gui

# for the Tanks&Temples dataset, you should first convert it to nerf-compatible format:
python scripts/tanks2nerf.py data/TanksAndTemple/Family # write `trainsforms_{split}.json` for [train, val, test]
# then you can train as a blender dataset (you'll need to tune the scale & bound if necessary)
python main_nerf.py data/TanksAndTemple/Family --workspace trial_nerf_family -O --bound 1.0 --scale 0.33 --dt_gamma 0
python main_nerf.py data/TanksAndTemple/Family --workspace trial_nerf_family -O --bound 1.0 --scale 0.33 --dt_gamma 0 --gui

# for custom dataset, you should:
# 1. take a video / many photos from different views 
# 2. put the video under a path like ./data/custom/video.mp4 or the images under ./data/custom/images/*.jpg.
# 3. call the preprocess code: (should install ffmpeg and colmap first! refer to the file for more options)
python scripts/colmap2nerf.py --video ./data/custom/video.mp4 --run_colmap # if use video
python scripts/colmap2nerf.py --images ./data/custom/images/ --run_colmap # if use images
python scripts/colmap2nerf.py --video ./data/custom/video.mp4 --run_colmap --dynamic # if the scene is dynamic (for D-NeRF settings), add the time for each frame.
# 4. it should create the transform.json, and you can train with: (you'll need to try with different scale & bound & dt_gamma to make the object correctly located in the bounding box and render fluently.)
python main_nerf.py data/custom --workspace trial_nerf_custom -O --gui --scale 2.0 --bound 1.0 --dt_gamma 0.02

### Instant-ngp SDF
python main_sdf.py data/armadillo.obj --workspace trial_sdf
python main_sdf.py data/armadillo.obj --workspace trial_sdf --fp16
python main_sdf.py data/armadillo.obj --workspace trial_sdf --fp16 --ff
python main_sdf.py data/armadillo.obj --workspace trial_sdf --fp16 --tcnn

python main_sdf.py data/armadillo.obj --workspace trial_sdf --fp16 --test

### TensoRF
# almost the same as Instant-ngp NeRF, just replace the main script.
python main_tensoRF.py data/fox --workspace trial_tensoRF -O
python main_tensoRF.py data/nerf_synthetic/lego --workspace trial_tensoRF -O --bound 1.0 --scale 0.8 --dt_gamma 0 

### CCNeRF
# training on single objects, turn on --error_map for better quality.
python main_CCNeRF.py data/nerf_synthetic/chair --workspace trial_cc_chair -O --bound 1.0 --scale 0.67 --dt_gamma 0 --error_map
python main_CCNeRF.py data/nerf_synthetic/ficus --workspace trial_cc_ficus -O --bound 1.0 --scale 0.67 --dt_gamma 0 --error_map
python main_CCNeRF.py data/nerf_synthetic/hotdog --workspace trial_cc_hotdog -O --bound 1.0 --scale 0.67 --dt_gamma 0 --error_map
# compose, use a larger bound and more samples per ray for better quality.
python main_CCNeRF.py data/nerf_synthetic/hotdog --workspace trial_cc_hotdog -O --bound 2.0 --scale 0.67 --dt_gamma 0 --max_steps 2048 --test --compose
# compose + gui, only about 1 FPS without dynamic resolution... just for quick verification of composition results.
python main_CCNeRF.py data/nerf_synthetic/hotdog --workspace trial_cc_hotdog -O --bound 2.0 --scale 0.67 --dt_gamma 0 --test --compose --gui

### D-NeRF
# almost the same as Instant-ngp NeRF, just replace the main script.
# use deformation to model dynamic scene
python main_dnerf.py data/dnerf/jumpingjacks --workspace trial_dnerf_jumpingjacks -O --bound 1.0 --scale 0.8 --dt_gamma 0
python main_dnerf.py data/dnerf/jumpingjacks --workspace trial_dnerf_jumpingjacks -O --bound 1.0 --scale 0.8 --dt_gamma 0 --gui
# use temporal basis to model dynamic scene
python main_dnerf.py data/dnerf/jumpingjacks --workspace trial_dnerf_basis_jumpingjacks -O --bound 1.0 --scale 0.8 --dt_gamma 0 --basis
python main_dnerf.py data/dnerf/jumpingjacks --workspace trial_dnerf_basis_jumpingjacks -O --bound 1.0 --scale 0.8 --dt_gamma 0 --basis --gui
# for the hypernerf dataset, first convert it into nerf-compatible format:
python scripts/hyper2nerf.py data/split-cookie --downscale 2 # will generate transforms*.json
python main_dnerf.py data/split-cookie/ --workspace trial_dnerf_cookies -O --bound 1 --scale 0.3 --dt_gamma 0

check the scripts directory for more provided examples.

Performance Reference

Tested with the default settings on the Lego dataset. Here the speed refers to the iterations per second on a V100.

Model Split PSNR Train Speed Test Speed
instant-ngp (paper) trainval? 36.39 - -
instant-ngp (-O) train (30K steps) 34.15 97 7.8
instant-ngp (-O --error_map) train (30K steps) 34.88 50 7.8
instant-ngp (-O) trainval (40k steps) 35.22 97 7.8
instant-ngp (-O --error_map) trainval (40k steps) 36.00 50 7.8
TensoRF (paper) train (30K steps) 36.46 - -
TensoRF (-O) train (30K steps) 35.05 51 2.8
TensoRF (-O --error_map) train (30K steps) 35.84 14 2.8

Tips

Q: How to choose the network backbone?

A: The -O flag which uses pytorch's native mixed precision is suitable for most cases. I don't find very significant improvement for --tcnn and --ff, and they require extra building. Also, some new features may only be available for the default -O mode.

Q: CUDA Out Of Memory for my dataset.

A: You could try to turn off --preload which loads all images in to GPU for acceleration (if use -O, change it to --fp16 --cuda_ray). Another solution is to manually set downscale in NeRFDataset to lower the image resolution.

Q: How to adjust bound and scale?

A: You could start with a large bound (e.g., 16) or a small scale (e.g., 0.3) to make sure the object falls into the bounding box. The GUI mode can be used to interactively shrink the bound to find the suitable value. Uncommenting this line will visualize the camera poses, and some good examples can be found in this issue.

Q: Noisy novel views for realistic datasets.

A: You could try setting bg_radius to a large value, e.g., 32. It trains an extra environment map to model the background in realistic photos. A larger bound will also help. An example for bg_radius in the firekeeper dataset: bg_model

Difference from the original implementation

  • Instead of assuming the scene is bounded in the unit box [0, 1] and centered at (0.5, 0.5, 0.5), this repo assumes the scene is bounded in box [-bound, bound], and centered at (0, 0, 0). Therefore, the functionality of aabb_scale is replaced by bound here.
  • For the hashgrid encoder, this repo only implements the linear interpolation mode.
  • For TensoRF, we don't implement regularizations other than L1, and use trunc_exp as the density activation instead of softplus. The alpha mask pruning is replaced by the density grid sampler from instant-ngp, which shares the same logic for acceleration.

Citation

If you find this work useful, a citation will be appreciated via:

@misc{torch-ngp,
    Author = {Jiaxiang Tang},
    Year = {2022},
    Note = {https://github.com/ashawkey/torch-ngp},
    Title = {Torch-ngp: a PyTorch implementation of instant-ngp}
}

@article{tang2022compressible,
    title = {Compressible-composable NeRF via Rank-residual Decomposition},
    author = {Tang, Jiaxiang and Chen, Xiaokang and Wang, Jingbo and Zeng, Gang},
    journal = {arXiv preprint arXiv:2205.14870},
    year = {2022}
}

Acknowledgement

  • Credits to Thomas Müller for the amazing tiny-cuda-nn and instant-ngp:

    @misc{tiny-cuda-nn,
        Author = {Thomas M\"uller},
        Year = {2021},
        Note = {https://github.com/nvlabs/tiny-cuda-nn},
        Title = {Tiny {CUDA} Neural Network Framework}
    }
    
    @article{mueller2022instant,
        title = {Instant Neural Graphics Primitives with a Multiresolution Hash Encoding},
        author = {Thomas M\"uller and Alex Evans and Christoph Schied and Alexander Keller},
        journal = {arXiv:2201.05989},
        year = {2022},
        month = jan
    }
    
  • The framework of NeRF is adapted from nerf_pl:

    @misc{queianchen_nerf,
        author = {Quei-An, Chen},
        title = {Nerf_pl: a pytorch-lightning implementation of NeRF},
        url = {https://github.com/kwea123/nerf_pl/},
        year = {2020},
    }
    
  • The official TensoRF implementation:

    @article{TensoRF,
      title={TensoRF: Tensorial Radiance Fields},
      author={Chen, Anpei and Xu, Zexiang and Geiger, Andreas and Yu, Jingyi and Su, Hao},
      journal={arXiv preprint arXiv:2203.09517},
      year={2022}
    }
    
  • The NeRF GUI is developed with DearPyGui.

torch-ngp's People

Contributors

ashawkey avatar domaradzkimaciej avatar w-m avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

torch-ngp's Issues

the GUI can‘t work

Hello, when using the GUI, the screen is very stuck, and I can't observe the rendering effect. I have two GPUs in my computer, I don't know if this is the reason? Do I need to disable one of the GPUs?

some black fog on the test results

hi, thank you very much for your work.

I have one question,when I was training, I had a 34+ PSNR,But this is what happened when I tested a new pose.

test.mp4

In addition, I ran the original NeRF with the same data and got better results as follows

face_test_spiral_200000_rgb.mp4

I tried tweaking the parameters "bound" and "scale" but the results didn't improve.

What the reason behind this result do you think?

Many thanks.

Bad training performance with hash encoding

Hi @ashawkey,

I cannot reproduce the training gifs you provided in README.md. Instead, I got the following results after training 200 epochs. Did I mess up something?

Command:

python train_nerf.py data/nerf/fox --workspace trial_nerf # fp16 and ff not enabled

Log info:

==> Finished Epoch 199.
==> Start Training Epoch 200, lr=0.000359 ...
==> Finished Epoch 200.
++> Evaluate at epoch 200 ...
==> Saving validation image to trial_nerf/validation/ngp_0200_0001.png
++> Evaluate epoch 200 Finished.
[INFO] New best result: 0.001995325554162264 --> 0.0019571746233850718
==> Start Test, save results to trial_nerf/results
NeRF-hash NeRF-freq
nerf_hash nerf_freq

Thanks a lot!

Result is not sharp on lego. Any tip to improve quality?

I appreciate your great work!

However, I train Lego scene with both torch-ngp and instant-ngp at 20000 steps (200 epoch). instant-ngp has a sharper result (green rectangle) while torch-ngp is missing some part of the object (red rectangle). Any tip to improve the quality?

the training command is

OMP_NUM_THREADS=8 CUDA_VISIBLE_DEVICES=0 python train_nerf.py data/nerf_synthetic/lego --workspace trial_nerf_lego_tcnn --fp16 --tcnn --cuda_ray --bound 1 --scale 0.8 --mode blender

image

Best regards

CUDA Issues running NeRF example

Running into some issues running the nerf example.

Screenshot from 2022-03-17 09-41-49

My setup is:
OS: Ubuntu 20.04
GPU: RTX 3070 Ti
CUDA: 11.6 (driver 510)

Pretty sure CUDA is install correctly and I've added these lines to my .zshrc:
export PATH="/usr/local/cuda-11.6/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.6/lib64:$LD_LIBRARY_PATH"

hashencoder.cu error with some specific gpu models

Thank you for your work!

Had this issue when I am running your code on the server with different GPU models, the script failed when running with some specific GPU models (like tesla P100,), but worked fine with some other GPU models(like tesla v100)

hashencoder.cu(25): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (__half *, c10::Half)

ninja: build stopped: subcommand failed.

cuda version: 11.3.1
cudnn version: 8.2.0.53

real scene test, get poor result

hi, thank you very much for your work.

I have one question, when I try to test my real scene data. When it comes to evaluate, the result is like this,

image

but when it comes to novel view, the result is the following image,

image

image

What the reason behind this result do you think?

Many thanks.

Gigapixel Image Approximation Experiment in instant-ngp

Recently I tried to put this code into a video coding task ((x,y,t)->(r, g,b)), which is similar to SDF task, but the effect is not ideal, so I tried the Gigapixel Image Approximation experiment in the paper, and found that it cannot be reproduced under the same experimental setting. Has anyone ever encountered the same problem yet?

Question about the scene is bounded in box

Thanks for your great work. I noticed the following words you said in the readme:
"Instead of assuming the scene is bounded in the unit box [0, 1] and centered at (0.5, 0.5, 0.5), this repo assumes the scene is bounded in box [-bound, bound], and centered at (0, 0, 0). Therefore, the functionality of aabb_scale is replaced by bound here."

Is there any consideration for doing this?

Insufficient hash resolution harming performance

I noticed that the desired_resolution of hash encoding is fixed in torch-ngp:

self.encoder, self.in_dim = get_encoder(encoding)

num_levels=16, level_dim=2, base_resolution=16, per_level_scale=1.3819, log2_hashmap_size=19, desired_resolution=2048,

While in instant-ngp, it is a parameter controlled by aabb_scale. When aabb_scale == 1, the desired_resolution = 2048, and when aabb_scale increases to 16, the desired_resolution increases to 32768. (The per_level_scale changes, too.)

That proved to be a huge gap on performance (also memory) in my trial, which is on a rather big scene than lego.

NeRF inference profiling

This issue records the current profiling of NeRF, which shows the speed bottleneck.

Inference with --fp16 --ff on 3 1920x1080 frames, 128+128 points per ray:

-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ----------[258/1997]
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls       
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                           _hash_encode         3.53%     391.299ms        10.87%        1.206s     396.505us        2.607s        24.09%        3.510s       1.154ms          3042        
void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.607s        24.09%        2.607s     857.029us          3042        
                                         _ffmlp_forward         5.02%     557.577ms         8.39%     930.914ms     153.010us        2.054s        18.98%        2.074s     340.958us          6084        
void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        2.054s        18.98%        2.054s     337.648us          6084        
                                             aten::_cat         1.98%     219.730ms         4.66%     517.135ms      30.892us        1.317s        12.17%        1.317s      78.680us         16740        
                                             aten::sort         0.30%      33.207ms         2.29%     254.487ms     167.316us        1.226s        11.33%        1.281s     841.893us          1521        
void at::native::bitonicSortKVInPlace<float, long, 2...         0.00%       0.000us         0.00%       0.000us       0.000us        1.226s        11.33%        1.226s     806.281us          1521        
                                            aten::copy_         3.08%     341.543ms        31.30%        3.474s      95.054us        1.150s        10.62%        1.155s      31.611us         36546        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.035s         9.57%        1.035s     340.388us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     541.719ms         5.01%     541.719ms      35.602us         15216        
                                            _sh_encoder         2.18%     241.916ms         3.85%     426.764ms     140.291us     534.972ms         4.94%     595.884ms     195.886us          3042        
void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     534.972ms         4.94%     534.972ms     175.862us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     442.365ms         4.09%     442.365ms     145.419us          3042        
                                              aten::mul         2.60%     288.552ms         4.19%     465.465ms      20.391us     292.856ms         2.71%     292.856ms      12.829us         22827        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     247.842ms         2.29%     247.842ms      13.572us         18261        
                                              aten::sub         3.23%     358.866ms         5.20%     577.077ms      21.069us     217.357ms         2.01%     217.357ms       7.936us         27390        
                                           aten::gather         0.97%     107.481ms         1.54%     171.240ms      28.146us     212.941ms         1.97%     212.941ms      35.000us          6084        
                                              aten::add         1.94%     214.941ms         3.08%     341.548ms      20.410us     202.851ms         1.87%     202.851ms      12.122us         16734        
                                              aten::div         1.78%     197.500ms         2.82%     312.510ms      20.526us     180.809ms         1.67%     180.809ms      11.876us         15225        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     133.226ms         1.23%     133.226ms      14.599us          9126        
                                          aten::sigmoid         0.44%      48.847ms         0.73%      80.799ms      26.561us     124.591ms         1.15%     124.591ms      40.957us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     124.591ms         1.15%     124.591ms      40.957us          3042        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us     123.191ms         1.14%     123.191ms      80.993us          1521        
                                        aten::clamp_min         0.57%      63.140ms         2.08%     230.413ms      30.274us     115.899ms         1.07%     230.170ms      30.242us          7611        
void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us     115.718ms         1.07%     115.718ms      38.040us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     114.181ms         1.06%     114.181ms      37.535us          3042        
void at::native::_scatter_gather_elementwise_kernel<...         0.00%       0.000us         0.00%       0.000us       0.000us      97.223ms         0.90%      97.223ms      31.960us          3042        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      95.482ms         0.88%      95.482ms      12.550us          7608        
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      92.027ms         0.85%      92.027ms      12.096us          7608        
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      89.500ms         0.83%      89.500ms      14.711us          6084        
                                              aten::sum         1.05%     116.752ms         1.50%     166.470ms      27.362us      87.986ms         0.81%      87.986ms      14.462us          6084        
                                            aten::clamp         0.61%      67.853ms         1.10%     122.594ms      20.150us      87.157ms         0.81%      88.785ms      14.593us          6084        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      87.157ms         0.81%      87.157ms      19.101us          4563        
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      80.040ms         0.74%      80.040ms      52.623us          1521        
                                              aten::min         0.98%     108.652ms         2.15%     238.483ms      39.198us      66.320ms         0.61%      95.284ms      15.661us          6084        
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      65.883ms         0.61%      65.883ms      10.829us          6084        
_ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      64.053ms         0.59%      64.053ms      21.056us          3042        
                                          aten::cumprod         0.43%      47.625ms         0.65%      71.634ms      23.548us      63.409ms         0.59%      63.409ms      20.845us          3042        
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      58.054ms         0.54%      58.054ms      19.084us          3042  
                                              aten::max         0.94%     104.159ms         2.01%     222.583ms      36.585us      57.724ms         0.53%      86.936ms      14.289us          6084  
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      51.809ms         0.48%      51.809ms       8.507us          6090  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      49.877ms         0.46%      49.877ms      16.396us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      46.988ms         0.43%      46.988ms      10.298us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      45.896ms         0.42%      45.896ms       7.536us          6090  
                                            aten::fill_         1.03%     114.067ms         2.19%     242.845ms      15.960us      43.753ms         0.40%      43.753ms       2.875us         15216  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      41.419ms         0.38%      41.419ms      13.616us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      40.997ms         0.38%      40.997ms      13.477us          3042  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      39.479ms         0.36%      39.479ms       6.476us          6096  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      39.258ms         0.36%      39.258ms      25.811us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      37.032ms         0.34%      37.032ms       4.869us          7605  
                                     aten::searchsorted         0.16%      17.916ms         0.39%      43.311ms      28.475us      33.258ms         0.31%      33.258ms      21.866us          1521  
void at::native::(anonymous namespace)::searchsorted...         0.00%       0.000us         0.00%       0.000us       0.000us      33.258ms         0.31%      33.258ms      21.866us          1521  
                         Memcpy DtoD (Device -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      31.294ms         0.29%      31.294ms       5.144us          6084  
                                          aten::maximum         0.16%      17.381ms         0.25%      27.532ms      18.101us      29.250ms         0.27%      29.250ms      19.231us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      29.250ms         0.27%      29.250ms      19.231us          1521  
                                          aten::minimum         0.15%      16.095ms         0.24%      26.151ms      17.193us      28.983ms         0.27%      28.983ms      19.055us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      28.983ms         0.27%      28.983ms      19.055us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      24.661ms         0.23%      24.661ms      16.214us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      21.328ms         0.20%      21.328ms       7.011us          3042  
                                              aten::neg         0.31%      34.620ms         0.51%      56.827ms      18.681us      19.157ms         0.18%      19.157ms       6.298us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      19.157ms         0.18%      19.157ms       6.298us          3042  
_ZN2at6native32tensor_kernel_scan_innermost_dimIfLi1...         0.00%       0.000us         0.00%       0.000us       0.000us      18.386ms         0.17%      18.386ms      12.088us          1521  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      18.318ms         0.17%      18.318ms      12.043us          1521  
                                              aten::exp         0.35%      38.918ms         0.53%      58.556ms      19.249us      18.278ms         0.17%      18.278ms       6.009us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      18.278ms         0.17%      18.278ms       6.009us          3042  
                                           aten::cumsum         0.21%      23.407ms         0.31%      34.911ms      22.953us      17.978ms         0.17%      17.978ms      11.820us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.847ms         0.14%      14.847ms       9.761us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      14.280ms         0.13%      14.280ms       9.389us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.793ms         0.13%      13.793ms       9.068us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      13.094ms         0.12%      13.094ms       1.434us          9132  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      11.129ms         0.10%      11.129ms       1.824us          6102  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      10.466ms         0.10%      10.466ms       6.867us          1524  
                                         aten::_s_where         0.44%      48.315ms         0.88%      98.069ms      21.492us       9.361ms         0.09%       9.361ms       2.052us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.361ms         0.09%       9.361ms       2.052us          4563  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       9.338ms         0.09%       9.338ms       3.070us          3042  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       8.848ms         0.08%       8.848ms       5.817us          1521  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       8.264ms         0.08%       8.264ms       5.433us          1521  
                                             aten::norm         0.26%      28.412ms         0.40%      44.594ms      29.261us       7.870ms         0.07%       7.870ms       5.164us          1524  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.846ms         0.07%       7.846ms       5.158us          1521  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us       7.667ms         0.07%       7.667ms       5.041us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.533ms         0.06%       6.533ms       2.148us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       6.372ms         0.06%       6.372ms       1.396us          4566  
                                               aten::lt         0.40%      44.178ms         0.59%      65.952ms      21.680us       6.184ms         0.06%       6.184ms       2.033us          3042  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       4.586ms         0.04%       4.586ms       3.015us          1521  
                                           aten::arange         0.92%     102.334ms         1.46%     162.401ms      53.281us       3.053ms         0.03%       6.106ms       2.003us          3048  
void (anonymous namespace)::elementwise_kernel_with_...         0.00%       0.000us         0.00%       0.000us       0.000us       3.053ms         0.03%       3.053ms       2.007us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.718ms         0.02%       1.718ms       1.127us          1524  
                                               aten::gt         0.21%      23.296ms         0.31%      34.756ms      22.851us       1.647ms         0.02%       1.647ms       1.083us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.647ms         0.02%       1.647ms       1.083us          1521  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.598ms         0.01%       1.598ms       1.051us          1521  
                                              aten::bmm         0.02%       2.302ms         0.07%       7.562ms       1.260ms       1.026ms         0.01%       2.360ms     393.333us             6  
                        volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.026ms         0.01%       1.026ms     171.000us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     289.000us         0.00%     289.000us      96.333us             3  
                              aten::_local_scalar_dense         0.29%      32.665ms         1.26%     139.968ms      22.979us     231.000us         0.00%     231.000us       0.038us          6091  
void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     203.000us         0.00%     203.000us      67.667us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                        Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us       3.000us         0.00%       3.000us       0.000us          6084  
                                            aten::empty         3.03%     336.134ms         3.03%     336.578ms       4.422us       0.000us         0.00%       0.000us       0.000us         76110  
                                          aten::random_         0.00%      31.000us         0.00%      31.000us      31.000us       0.000us         0.00%       0.000us       0.000us             1  
                                             aten::item         0.13%      14.935ms         1.40%     154.881ms      25.428us       0.000us         0.00%     231.000us       0.038us          6091  
                                            aten::zeros         0.00%      51.000us         0.00%      82.000us      20.500us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::zero_         0.22%      24.701ms         1.15%     127.125ms      20.881us       0.000us         0.00%      21.785ms       3.578us          6088  
enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     841.000us         0.01%       1.235ms     308.750us       0.000us         0.00%       0.000us       0.000us             4  
                                               aten::to         0.63%      69.980ms        32.04%        3.556s     111.123us       0.000us         0.00%     604.426ms      18.888us         32001  
                                            aten::stack         0.09%      10.056ms         0.64%      70.649ms      46.176us       0.000us         0.00%     125.256ms      81.867us          1530  
                                        aten::unsqueeze         0.70%      77.737ms         0.94%     104.781ms       4.046us       0.000us         0.00%       0.000us       0.000us         25896  
                                       aten::as_strided         1.31%     145.867ms         1.33%     147.451ms       0.969us       0.000us         0.00%       0.000us       0.000us        152229  
                                              aten::cat         0.61%      68.121ms         5.27%     585.256ms      34.962us       0.000us         0.00%        1.317s      78.680us         16740  
                                          aten::resize_         1.42%     157.545ms         1.42%     157.717ms       5.182us       0.000us         0.00%       0.000us       0.000us         30435  
                                          aten::detach_         0.00%      15.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                detach_         0.00%       7.000us         0.00%      11.000us       1.222us       0.000us         0.00%       0.000us       0.000us             9  
                                         aten::_to_copy         0.99%     109.595ms        31.46%        3.492s     163.761us       0.000us         0.00%     604.426ms      28.345us         21324  
                                    aten::empty_strided         3.28%     364.021ms         3.30%     366.594ms       9.263us       0.000us         0.00%       0.000us       0.000us         39576  
                                        cudaMemcpyAsync         2.73%     303.436ms         2.73%     303.436ms      16.603us       0.000us         0.00%       0.000us       0.000us         18276  
                                            aten::slice         1.85%     205.048ms         2.44%     270.957ms       3.492us       0.000us         0.00%       0.000us       0.000us         77583  
                                           aten::select         0.36%      39.487ms         0.48%      52.748ms       4.321us       0.000us         0.00%       0.000us       0.000us         12207  
                                         aten::linspace         0.23%      25.233ms         0.42%      46.684ms       7.658us       0.000us         0.00%       0.000us       0.000us          6096  
                                         aten::meshgrid         0.00%      74.000us         0.00%     149.000us      49.667us       0.000us         0.00%       0.000us       0.000us             3  
                                             aten::view         0.31%      34.551ms         0.31%      34.551ms       2.523us       0.000us         0.00%       0.000us       0.000us         13695  
                                           aten::expand         0.32%      35.520ms         0.44%      49.138ms       4.604us       0.000us         0.00%       0.000us       0.000us         10674  
                                                aten::t         0.00%      60.000us         0.00%      79.000us      13.167us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::transpose         0.00%      46.000us         0.00%      70.000us       5.833us       0.000us         0.00%       0.000us       0.000us            12  
                                        aten::expand_as         0.09%       9.560ms         0.25%      27.680ms       6.050us       0.000us         0.00%       0.000us       0.000us          4575  
                                       aten::contiguous         0.03%       3.433ms         0.57%      62.986ms      41.248us       0.000us         0.00%       7.658ms       5.015us          1527  
                                            aten::clone         0.31%      34.313ms         2.37%     263.087ms      34.567us       0.000us         0.00%     489.353ms      64.295us          7611  
                                       aten::empty_like         0.67%      74.421ms         2.41%     267.441ms      10.338us       0.000us         0.00%       0.000us       0.000us         25869  
                                  cudaStreamSynchronize        24.06%        2.670s        24.06%        2.670s     219.156us       0.000us         0.00%       0.000us       0.000us         12183  
                                          aten::reshape         0.84%      92.801ms         3.51%     389.672ms      10.245us       0.000us         0.00%     481.695ms      12.664us         38037  
                                   aten::_reshape_alias         0.62%      68.539ms         0.65%      72.009ms       2.254us       0.000us         0.00%       0.000us       0.000us         31953  
                                        aten::ones_like         0.30%      32.979ms         2.49%     276.039ms      30.228us       0.000us         0.00%      21.968ms       2.406us          9132  
                                       cudaLaunchKernel        16.00%        1.776s        16.00%        1.776s       8.336us       0.000us         0.00%       0.000us       0.000us        213009  
                                               cudaFree         0.00%       8.000us         0.00%       8.000us       4.000us       0.000us         0.00%       0.000us       0.000us             2  
                                 cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                   cudaGetSymbolAddress         0.00%       1.000us         0.00%       1.000us       1.000us       0.000us         0.00%       0.000us       0.000us             1  
                                             cudaMalloc         0.02%       2.525ms         0.02%       2.525ms     229.545us       0.000us         0.00%       0.000us       0.000us            11  
                               cudaEventCreateWithFlags         0.00%       7.000us         0.00%       7.000us       0.389us       0.000us         0.00%       0.000us       0.000us            18  
                                   cudaFuncSetAttribute         0.07%       7.749ms         0.07%       7.749ms       1.251us       0.000us         0.00%       0.000us       0.000us          6192  
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      22.000us         0.00%      22.000us       1.833us       0.000us         0.00%       0.000us       0.000us            12  
                                   aten::frobenius_norm         0.17%      19.175ms         0.96%     106.797ms      70.215us       0.000us         0.00%       9.278ms       6.100us          1521  
                                             aten::rsub         0.60%      66.377ms         2.68%     297.521ms      27.944us       0.000us         0.00%      43.565ms       4.092us         10647  
                                            aten::where         0.11%      11.916ms         0.99%     109.985ms      24.104us       0.000us         0.00%       9.361ms       2.052us          4563  
                                     aten::_unsafe_view         0.13%      14.005ms         0.23%      25.146ms       4.133us       0.000us         0.00%       0.000us       0.000us          6084  
                                        cudaMemsetAsync         0.37%      40.511ms         0.37%      40.511ms       6.659us       0.000us         0.00%       0.000us       0.000us          6084  
                                          aten::permute         0.14%      15.828ms         0.20%      22.640ms       7.442us       0.000us         0.00%       0.000us       0.000us          3042  
                                  cudaStreamIsCapturing         0.00%      13.000us         0.00%      13.000us       1.625us       0.000us         0.00%       0.000us       0.000us             8  
                                             aten::relu         0.48%      53.008ms         1.52%     168.367ms      55.347us       0.000us         0.00%     114.181ms      37.535us          3042  
                                       aten::zeros_like         0.20%      22.493ms         1.99%     220.955ms      36.317us       0.000us         0.00%      21.785ms       3.581us          6084  
                                           aten::detach         0.03%       3.163ms         0.06%       6.806ms       4.457us       0.000us         0.00%       0.000us       0.000us          1527  
                                                 detach         0.03%       3.643ms         0.04%       4.445ms       2.911us       0.000us         0.00%       0.000us       0.000us          1527  
                                  cudaDeviceSynchronize         0.00%      48.000us         0.00%      48.000us      48.000us       0.000us         0.00%       0.000us       0.000us             1  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 11.099s
Self CUDA time total: 10.822s


Inference with --fp16 --ff --cuda_raymarching on 3 1920x1080 frames:


-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------        
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Cal[76/1997]
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                           _hash_encode         2.08%     199.448ms         5.88%     564.231ms     369.020us        2.179s        23.75%        2.824s       1.847ms          1529  
void kernel_grid<c10::Half, 3u, 2u>(c10::Half const*...         0.00%       0.000us         0.00%       0.000us       0.000us        2.179s        23.75%        2.179s       1.425ms          1529  
                                         _ffmlp_forward         4.10%     393.471ms         6.16%     591.005ms     193.772us        1.983s        21.61%        1.993s     653.448us          3050  
void kernel_mlp_fused<64, 1, 8, __half, true>(Activa...         0.00%       0.000us         0.00%       0.000us       0.000us        1.979s        21.57%        1.979s     650.671us          3042  
                                       _generate_points         2.07%     198.297ms        71.59%        6.872s       4.518ms        1.638s        17.86%        1.688s       1.110ms          1521  
void kernel_generate_points<c10::Half>(c10::Half con...         0.00%       0.000us         0.00%       0.000us       0.000us        1.638s        17.86%        1.638s       1.077ms          1521  
                                             aten::_cat         0.55%      52.390ms         1.16%     111.195ms      36.255us        1.222s        13.31%        1.222s     398.330us          3067  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us        1.128s        12.29%        1.128s     741.391us          1521  
                                            aten::copy_         2.10%     201.777ms         6.39%     613.478ms      30.858us     879.039ms         9.58%     884.456ms      44.488us         19881  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     552.081ms         6.02%     552.081ms     120.779us          4571  
void kernel_sh<c10::Half>(c10::Half const*, c10::Hal...         0.00%       0.000us         0.00%       0.000us       0.000us     516.578ms         5.63%     516.578ms     339.631us          1521  
                                            _sh_encoder         1.17%     112.374ms         1.54%     148.212ms      97.444us     516.234ms         5.63%     516.234ms     339.404us          1521  
                                       _accumulate_rays         1.37%     131.976ms         2.77%     265.826ms     174.771us     330.450ms         3.60%     333.671ms     219.376us          1521  
void kernel_accumulate_rays_forward<c10::Half>(c10::...         0.00%       0.000us         0.00%       0.000us       0.000us     330.450ms         3.60%     330.450ms     217.258us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     250.761ms         2.73%     250.761ms      32.844us          7635  
                                          aten::sigmoid         0.25%      23.853ms         0.42%      40.290ms      26.489us     113.450ms         1.24%     113.450ms      74.589us          1521  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     113.450ms         1.24%     113.450ms      74.589us          1521  
                                        aten::clamp_min         0.28%      27.235ms         1.10%     105.430ms      34.409us     102.666ms         1.12%     205.332ms      67.014us          3064  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     102.576ms         1.12%     102.576ms      67.087us          1529  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us      91.992ms         1.00%      91.992ms      60.481us          1521  
                                              aten::div         0.48%      46.482ms         0.77%      73.898ms      24.110us      89.251ms         0.97%      89.251ms      29.119us          3065  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      54.861ms         0.60%      54.861ms      36.069us          1521  
                                              aten::add         0.27%      25.442ms         0.42%      40.336ms      26.329us      53.245ms         0.58%      53.245ms      34.755us          1532  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      53.051ms         0.58%      53.051ms      34.879us          1521  
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us      39.039ms         0.43%      39.039ms       8.509us          4588  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      33.580ms         0.37%      33.580ms      22.078us          1521  
                                              aten::max         0.39%      37.096ms         1.18%     113.050ms      73.889us      27.348ms         0.30%      79.880ms      52.209us          1530  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      27.216ms         0.30%      27.216ms      17.893us          1521  
                                              aten::min         0.44%      42.345ms         1.37%     131.581ms      86.001us      26.664ms         0.29%      84.516ms      55.239us          1530  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      26.534ms         0.29%      26.534ms      17.445us          1521  
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.368ms         0.17%      15.368ms       9.883us          1555  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      12.741ms         0.14%      12.741ms       4.188us          3042  
                                            aten::fill_         0.53%      51.353ms         1.22%     116.670ms      19.151us      12.460ms         0.14%      12.460ms       2.045us          6092  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       9.223ms         0.10%       9.223ms       3.030us          3044  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.497ms         0.08%       7.497ms       2.464us          3042  
void kernel_mlp_fused<64, 1, 8, __half, false>(Activ...         0.00%       0.000us         0.00%       0.000us       0.000us       3.178ms         0.03%       3.178ms     397.250us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       3.091ms         0.03%       3.091ms       2.032us          1521  
void at::native::(anonymous namespace)::CatArrayBatc...         0.00%       0.000us         0.00%       0.000us       0.000us       2.030ms         0.02%       2.030ms     676.667us             3  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.872ms         0.02%       1.872ms       1.224us          1529  
                                              aten::bmm         0.04%       3.937ms         0.10%       9.900ms       1.650ms       1.018ms         0.01%       2.343ms     390.500us             6  
                        volta_fp16_sgemm_fp16_128x32_tn         0.00%       0.000us         0.00%       0.000us       0.000us       1.018ms         0.01%       1.018ms     169.667us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     698.000us         0.01%     698.000us      58.167us            12  
                                              aten::sub         0.00%     292.000us         0.00%     457.000us      38.083us     615.000us         0.01%     615.000us      51.250us            12  
                                              aten::mul         0.00%     204.000us         0.00%     346.000us      28.833us     378.000us         0.00%     378.000us      31.500us            12  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     287.000us         0.00%     287.000us      31.889us             9  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us     283.000us         0.00%     283.000us      94.333us             3  
                              aten::_local_scalar_dense         0.30%      29.017ms        67.66%        6.495s       1.415ms     276.000us         0.00%     276.000us       0.060us          4589  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     276.000us         0.00%     276.000us      46.000us             6  
                                             aten::norm         0.00%     136.000us         0.00%     194.000us      64.667us     201.000us         0.00%     201.000us      67.000us             [26/1997]
void at::native::reduce_kernel<128, 4, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     201.000us         0.00%     201.000us      67.000us             3        
                          aten::max_pool3d_with_indices         0.00%      32.000us         0.00%      68.000us      68.000us     149.000us         0.00%     149.000us     149.000us             1        
void at::native::(anonymous namespace)::max_pool3d_w...         0.00%       0.000us         0.00%       0.000us       0.000us     149.000us         0.00%     149.000us     149.000us             1  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     139.000us         0.00%     139.000us      46.333us             3  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     124.000us         0.00%     124.000us      13.778us             9  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us     123.000us         0.00%     123.000us      13.667us             9  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     105.000us         0.00%     105.000us      13.125us             8  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us     100.000us         0.00%     100.000us      12.500us             8  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      96.000us         0.00%      96.000us      32.000us             3  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us      90.000us         0.00%      90.000us      30.000us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      66.000us         0.00%      66.000us      11.000us             6  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us      36.000us         0.00%      36.000us      36.000us             1  
                                             aten::mean         0.00%      52.000us         0.00%      79.000us      79.000us      25.000us         0.00%      25.000us      25.000us             1  
void at::native::reduce_kernel<512, 1, at::native::R...         0.00%       0.000us         0.00%       0.000us       0.000us      25.000us         0.00%      25.000us      25.000us             1  
                                        Memset (Device)         0.00%       0.000us         0.00%       0.000us       0.000us      15.000us         0.00%      15.000us       0.005us          3047  
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       7.000us         0.00%       7.000us       2.333us             3  
void at::native::unrolled_elementwise_kernel<at::nat...         0.00%       0.000us         0.00%       0.000us       0.000us       6.000us         0.00%       6.000us       2.000us             3  
                                         aten::linspace         0.00%     173.000us         0.00%     282.000us      15.667us       0.000us         0.00%       0.000us       0.000us            18  
                                            aten::empty         2.00%     192.373ms         2.02%     194.152ms       5.084us       0.000us         0.00%       0.000us       0.000us         38190  
                                            aten::split         0.00%      45.000us         0.00%     113.000us      37.667us       0.000us         0.00%       0.000us       0.000us             3  
                                           aten::narrow         0.00%      92.000us         0.00%     243.000us       6.750us       0.000us         0.00%       0.000us       0.000us            36  
                                            aten::slice         0.74%      70.841ms         0.98%      93.966ms       4.104us       0.000us         0.00%       0.000us       0.000us         22895  
                                       aten::as_strided         0.37%      35.140ms         0.37%      35.144ms       1.146us       0.000us         0.00%       0.000us       0.000us         30680  
                                       aten::zeros_like         0.06%       5.980ms         0.64%      61.913ms      40.679us       0.000us         0.00%       5.716ms       3.756us          1522  
                                       aten::empty_like         0.18%      17.617ms         0.56%      53.704ms       8.784us       0.000us         0.00%       0.000us       0.000us          6114  
                                    aten::empty_strided         1.13%     108.323ms         1.13%     108.323ms      10.107us       0.000us         0.00%       0.000us       0.000us         10718  
                                            aten::zero_         0.20%      18.962ms         1.14%     109.460ms      23.920us       0.000us         0.00%      10.604ms       2.317us          4576  
                                       cudaLaunchKernel         4.97%     476.865ms         4.97%     476.865ms      10.077us       0.000us         0.00%       0.000us       0.000us         47323  
                                         aten::meshgrid         0.00%     185.000us         0.00%     403.000us      36.636us       0.000us         0.00%       0.000us       0.000us            11  
                                             aten::view         0.14%      13.340ms         0.14%      13.340ms       2.884us       0.000us         0.00%       0.000us       0.000us          4626  
                                           aten::expand         0.00%     209.000us         0.00%     270.000us       5.294us       0.000us         0.00%       0.000us       0.000us            51  
                                          aten::reshape         0.45%      42.905ms         1.49%     142.878ms       8.504us       0.000us         0.00%     441.697ms      26.288us         16802  
                                            aten::clone         0.20%      18.941ms         1.62%     155.338ms      33.872us       0.000us         0.00%     552.081ms     120.384us          4586  
                                     aten::_unsafe_view         0.04%       4.005ms         0.07%       7.094ms       4.612us       0.000us         0.00%       0.000us       0.000us          1538  
                                              aten::cat         0.16%      14.993ms         1.31%     126.188ms      41.144us       0.000us         0.00%        1.222s     398.330us          3067  
                                          aten::resize_         0.28%      27.066ms         0.28%      27.066ms       5.879us       0.000us         0.00%       0.000us       0.000us          4604  
                                            aten::zeros         0.14%      13.016ms         1.06%     101.817ms      33.339us       0.000us         0.00%       4.888ms       1.601us          3054  
                                               aten::to         0.78%      74.779ms         7.25%     696.110ms      30.367us       0.000us         0.00%     317.633ms      13.857us         22923  
                                         aten::_to_copy         0.67%      64.749ms         6.50%     623.481ms      50.946us       0.000us         0.00%     317.633ms      25.955us         12238  
                                        cudaMemcpyAsync        68.03%        6.530s        68.03%        6.530s       1.063ms       0.000us         0.00%       0.000us       0.000us          6143  
                                  cudaStreamSynchronize         1.54%     148.007ms         1.54%     148.007ms      24.129us       0.000us         0.00%       0.000us       0.000us          6134  
                                   aten::_reshape_alias         0.38%      36.356ms         0.39%      37.819ms       2.478us       0.000us         0.00%       0.000us       0.000us         15264  
                                        cudaMemsetAsync         0.22%      20.762ms         0.22%      20.762ms       6.814us       0.000us         0.00%       0.000us       0.000us          3047  
                                             aten::item         0.13%      12.496ms        67.79%        6.507s       1.418ms       0.000us         0.00%     276.000us       0.060us          4589  
                                  cudaStreamIsCapturing         0.00%       5.000us         0.00%       5.000us       1.667us       0.000us         0.00%       0.000us       0.000us             3  
                                             cudaMalloc         0.03%       2.440ms         0.03%       2.440ms     406.667us       0.000us         0.00%       0.000us       0.000us             6  
                                          aten::permute         0.09%       8.435ms         0.12%      11.996ms       7.846us       0.000us         0.00%       0.000us       0.000us          1529  
                                   cudaFuncSetAttribute         0.04%       3.610ms         0.04%       3.610ms       1.143us       0.000us         0.00%       0.000us       0.000us          3158  
                                           aten::select         0.34%      33.081ms         0.39%      37.883ms      12.264us       0.000us         0.00%       0.000us       0.000us          3089  
                                             aten::relu         0.08%       7.800ms         0.69%      66.312ms      43.370us       0.000us         0.00%     102.576ms      67.087us          1529  
                                           aten::detach         0.00%      34.000us         0.00%     128.000us       9.143us       0.000us         0.00%       0.000us       0.000us            14  
                                                 detach         0.00%      94.000us         0.00%      97.000us       6.929us       0.000us         0.00%       0.000us       0.000us            14  
                                  aten::constant_pad_nd         0.00%      11.000us         0.00%      71.000us      71.000us       0.000us         0.00%      51.000us      51.000us             1  
                                        aten::unsqueeze         0.00%     131.000us         0.00%     196.000us       4.780us       0.000us         0.00%       0.000us       0.000us            41  
                                       aten::max_pool3d         0.00%      38.000us         0.00%     106.000us     106.000us       0.000us         0.00%     149.000us     149.000us             1  
                                          aten::squeeze         0.00%       8.000us         0.00%       9.000us       4.500us       0.000us         0.00%       0.000us       0.000us             2  
                                          aten::random_         0.00%      28.000us         0.00%      28.000us      28.000us       0.000us         0.00%       0.000us       0.000us             1  
enumerate(DataLoader)#_SingleProcessDataLoaderIter._...         0.01%     829.000us         0.01%       1.073ms     268.250us       0.000us         0.00%       0.000us       0.000us             4  
                                            aten::stack         0.00%      87.000us         0.01%     516.000us      57.333us       0.000us         0.00%       2.030ms     225.556us             9  
                                          aten::detach_         0.00%      16.000us         0.00%      22.000us       2.444us       0.000us         0.00%       0.000us       0.000us             9  
                                                detach_         0.00%       6.000us         0.00%      12.000us       1.333us       0.000us         0.00%       0.000us       0.000us             9  
                                                aten::t         0.00%      58.000us         0.00%      76.000us      12.667us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::transpose         0.00%      55.000us         0.00%      75.000us       6.250us       0.000us         0.00%       0.000us       0.000us            12  
                                        aten::expand_as         0.00%      28.000us         0.00%      89.000us       7.417us       0.000us         0.00%       0.000us       0.000us            12  
                                       aten::contiguous         0.07%       7.113ms         1.10%     105.922ms      34.751us       0.000us         0.00%     110.384ms      36.215us          3048  
                                           aten::arange         0.02%       2.239ms         0.05%       4.421ms     736.833us       0.000us         0.00%       0.000us       0.000us             6  
                                        aten::ones_like         0.00%      42.000us         0.00%     461.000us      76.833us       0.000us         0.00%      79.000us      13.167us             6  
                                               cudaFree         0.00%      12.000us         0.00%      12.000us       6.000us       0.000us         0.00%       0.000us       0.000us             2  
                                 cudaDeviceGetAttribute         0.00%       2.000us         0.00%       2.000us       0.143us       0.000us         0.00%       0.000us       0.000us            14  
                                   cudaGetSymbolAddress         0.00%       4.000us         0.00%       4.000us       4.000us       0.000us         0.00%       0.000us       0.000us             1  
                               cudaEventCreateWithFlags         0.00%      12.000us         0.00%      12.000us       0.667us       0.000us         0.00%       0.000us       0.000us            18  
cudaOccupancyMaxActiveBlocksPerMultiprocessorWithFla...         0.00%      14.000us         0.00%      14.000us       1.167us       0.000us         0.00%       0.000us       0.000us            12  
                                             aten::ones         0.07%       7.094ms         0.44%      41.995ms      27.610us       0.000us         0.00%       1.762ms       1.158us          1521  
                                  cudaDeviceSynchronize         0.00%      19.000us         0.00%      19.000us      19.000us       0.000us         0.00%       0.000us       0.000us             1  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 9.599s
Self CUDA time total: 9.175s


Support batch-based hash grid?

Hi, thanks a lot for this great work.

I wonder whether it is possible to support the batch-based hash grid?

Currently, it seems that the hash grid (as well as the hash embeddings) is #offsets x level_dim. In many tasks, hash embeddings with B x #offsets x level_dim might be helpful.

A naive implementation might be for-loop the batch-dim of the hash grid in https://github.com/ashawkey/torch-ngp/blob/main/hashencoder/hashgrid.py#L137, while this approach might be more inefficient than the cuda kernel implementation.

Thanks for your reply.

About the zeros-initization of the hash-grid;

Hi
I find the zeros-initilization of the hash-grid in the Sec.4 IMPLEMENTATION is works; (I need those special initialization in my experiment setting)
but i test your code, it not work;
can you explain it? thank you

Do we need separate network for coarse and fine sampling

In NeRF's original paper, they are optimizing two separate networks for coarse and fine samples in hierarchical sampling. In this repo, I found only a single model is used to compute density/radiance for both coarse and fine samples.

sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3), bound=bound)

new_sigmas, new_rgbs = self(new_pts.reshape(B, -1, 3), new_dirs.reshape(B, -1, 3), bound=bound)

Do we need separate network for coarse and fine samples? Or the Hash encoding allows us to model the coarse and fine samples with a single network?

need help, compile successfully but cannot run the code

I can successfully compile the codes but when I tried to train the model using

python main_nerf.py data/fox --workspace trial_nerf --fp16 --ff --cuda_ray --gui

I got the errors as,

Traceback (most recent call last):
File "main_nerf.py", line 47, in
from nerf.network_ff import NeRFNetwork
File "D:\torch-ngp\nerf\network_ff.py", line 5, in
from encoding import get_encoder
File "D:\torch-ngp\encoding.py", line 5, in
from hashencoder import HashEncoder
File "D:\torch-ngp\hashencoder_init_.py", line 1, in
from .hashgrid import HashEncoder
File "D:\torch-ngp\hashencoder\hashgrid.py", line 9, in
from .backend import _backend
File "D:\torch-ngp\hashencoder\backend.py", line 6, in
_backend = load(name='_hash_encoder',
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 986, in load
return _jit_compile(
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1193, in _jit_compile
_write_ninja_file_and_build_library(
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1285, in _write_ninja_file_and_build_library
_write_ninja_file_to_build_library(
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1679, in _write_ninja_file_to_build_library
_write_ninja_file(
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1790, in _write_ninja_file
cl_paths = subprocess.check_output(['where',
File "C:\Users\Miniconda3\envs\ngp\lib\subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\Miniconda3\envs\ngp\lib\subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

I am using a computer with RTX3090, I tried using an environment with CUDA 11.0 and torch 1.7.2, I also tested on CUDA 11.3 and torch 1.9. They all failed.

Can you please tell me what is problem? Thank you!

Distributed data parallel training

I'm trying to train torch-ngp on multiple GPUs.

I modified the dataloader by passing a distributedSampler, and passed the local_rank and world_size to Trainer, then run the script by torch.distributed.run.

The training process was normal before several epochs(e.g. 6), then crashed reporting this:

Traceback (most recent call last):                                                                                                                                     
  File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
    trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
  File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
    self.train_one_epoch(train_loader)                                                                                                                                 
  File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch                                                                                               
    preds, truths, loss = self.train_step(data)                                                                                                                        
  File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step                                                                                                    
    outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)                                    
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render                                                                                                     
    depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)                                                          
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run                                                                                                        
    sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))                                                                                                 
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
    return forward_call(*input, **kwargs)                                                                                                                              
  File "/data/torch-ngp/nerf_ddp/network.py", line 73, in forward                                                                                                      
    x = self.encoder(x, bound=self.bound)                                                                                                                              
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl                                                    
    return forward_call(*input, **kwargs)                                                                                                                              
  File "/data/torch-ngp/hashencoder/hashgrid.py", line 137, in forward                                                                                                 
    outputs = hash_encode(inputs, self.embeddings, self.offsets, self.per_level_scale, self.base_resolution, inputs.requires_grad)                                     
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/cuda/amp/autocast_mode.py", line 94, in decorate_fwd                                               
    return fwd(*args, **kwargs)                                                                                                                                        
  File "/data/torch-ngp/hashencoder/hashgrid.py", line 41, in forward                                                                                                  
    outputs = outputs.permute(1, 0, 2).reshape(B, L * C)      
RuntimeError: CUDA error: an illegal memory access was encountered
loss=0.0212 (0.0189), psnr=14.04 (15.27):    4% 1/23 [00:00<00:19,  1.12it/s]terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1640811806235/work/c10/cuda/CUDACachingAllocator.cpp:1211 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f20ca556d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1c613 (0x7f210fa74613 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1a2 (0x7f210fa75022 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0xa4 (0x7f20ca540314 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x299129 (0x7f2163a7c129 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0xada181 (0x7f21642bd181 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f21642bd482 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
frame #8: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
frame #9: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
frame #10: <unknown function> + 0x10fd05 (0x55a530986d05 in /data/miniconda3/envs/ngp/bin/python)
frame #11: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #12: <unknown function> + 0x10d0fc (0x55a5309840fc in /data/miniconda3/envs/ngp/bin/python)
frame #13: <unknown function> + 0x10fbcc (0x55a530986bcc in /data/miniconda3/envs/ngp/bin/python)
frame #14: <unknown function> + 0x217ca3 (0x55a530a8eca3 in /data/miniconda3/envs/ngp/bin/python)
frame #15: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
frame #16: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #17: <unknown function> + 0x10fd35 (0x55a530986d35 in /data/miniconda3/envs/ngp/bin/python)
frame #18: <unknown function> + 0x1aa047 (0x55a530a21047 in /data/miniconda3/envs/ngp/bin/python)
frame #19: _PyModule_ClearDict + 0x473 (0x55a530a1d723 in /data/miniconda3/envs/ngp/bin/python)
frame #20: PyImport_Cleanup + 0x408 (0x55a530a62f88 in /data/miniconda3/envs/ngp/bin/python)
frame #21: Py_FinalizeEx + 0x79 (0x55a530ac94f9 in /data/miniconda3/envs/ngp/bin/python)
frame #22: Py_RunMain + 0x1bc (0x55a530acc87c in /data/miniconda3/envs/ngp/bin/python)
frame #23: Py_BytesMain + 0x39 (0x55a530accc69 in /data/miniconda3/envs/ngp/bin/python)
frame #24: __libc_start_main + 0xe7 (0x7f219c264c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: <unknown function> + 0x1f7427 (0x55a530a6e427 in /data/miniconda3/envs/ngp/bin/python)

I tried to use --tcnn, it crashed, too:

Traceback (most recent call last):                                                                                                                                     
  File "train_nerf_ddp.py", line 207, in <module>                                                                                                                      
    trainer.train(train_loader, valid_loader, opt.num_epochs)                                                                                                          
  File "/data/torch-ngp/nerf_ddp/utils.py", line 407, in train                                                                                                         
    self.train_one_epoch(train_loader)
  File "/data/torch-ngp/nerf_ddp/utils.py", line 621, in train_one_epoch
    preds, truths, loss = self.train_step(data)
  File "/data/torch-ngp/nerf_ddp/utils.py", line 303, in train_step
    outputs = self.model.render(rays_o, rays_d, z_far=self.depth_scale, staged=False, bg_color=bg_color, perturb=True, **self.conf)
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 404, in render
    depth, image, depth_var = _run(rays_o, rays_d, num_steps, upsample_steps, bg_color, perturb, z_far=z_far)
  File "/data/torch-ngp/nerf_ddp/renderer.py", line 164, in run
    sigmas, rgbs = self(pts.reshape(B, -1, 3), dirs.reshape(B, -1, 3))
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/torch-ngp/nerf_ddp/network_tcnn.py", line 93, in forward
    x = self.encoder(x)
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 82, in forward
    output = _module_function.apply(                                                                                                                                   
  File "/data/miniconda3/envs/ngp/lib/python3.8/site-packages/tinycudann/modules.py", line 31, in forward                                                              
    native_ctx, output = native_tcnn_module.fwd(input, params)   
   RuntimeError: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:574 cudaDeviceSynchronize() failed with error an illegal memory access was encount[100/234]
loss=0.0471 (0.0579), psnr=8.83 (7.39):    4% 1/23 [00:01<00:23,  1.05s/it]Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cud
aFree(rawptr) failed with error an illegal memory access was encountered                                                                                               
Could not free memory: /tmp/pip-req-build-3voywypo/include/tiny-cuda-nn/gpu_memory.h:128 cudaFree(rawptr) failed with error an illegal memory access was encountered   
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered      
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered        
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:706 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
/tmp/pip-req-build-3voywypo/src/fully_fused_mlp.cu:707 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
loss=0.0592 (0.0557), psnr=7.68 (7.02):   17% 4/23 [00:01<00:05,  3.23it/s]terminate called after throwing an instance of 'c10::Error'
  what():  NCCL error in: /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181, unhandled cuda error, NCCL version 21.0.3
Process Group destroyed on rank 2
Exception raised from ncclCommAbort at /opt/conda/conda-bld/pytorch_1640811806235/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:181 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f2124ff7d62 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f2124ff468b in /data/miniconda3/envs/ngp/lib/python3.8/sit
e-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x107c48e (0x7f2176de148e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x113 (0x7f2176dc9d93 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #4: c10d::ProcessGroupNCCL::~ProcessGroupNCCL() + 0x9 (0x7f2176dc9fb9 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #5: <unknown function> + 0xe67b76 (0x7f21bf0ebb76 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0xe4d885 (0x7f21bf0d1885 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x2a1b00 (0x7f21be525b00 in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: <unknown function> + 0x2a2d6e (0x7f21be526d6e in /data/miniconda3/envs/ngp/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x10d098 (0x56344cddb098 in /data/miniconda3/envs/ngp/bin/python)
frame #10: <unknown function> + 0x10fbcc (0x56344cdddbcc in /data/miniconda3/envs/ngp/bin/python)
frame #11: PyDict_Clear + 0x14b (0x56344cddef6b in /data/miniconda3/envs/ngp/bin/python)
frame #12: <unknown function> + 0x110ff9 (0x56344cddeff9 in /data/miniconda3/envs/ngp/bin/python)
frame #13: <unknown function> + 0x130246 (0x56344cdfe246 in /data/miniconda3/envs/ngp/bin/python)
frame #14: _PyGC_CollectNoFail + 0x2a (0x56344cf08a2a in /data/miniconda3/envs/ngp/bin/python)
frame #15: PyImport_Cleanup + 0x2ce (0x56344ceb9e4e in /data/miniconda3/envs/ngp/bin/python)
frame #16: Py_FinalizeEx + 0x79 (0x56344cf204f9 in /data/miniconda3/envs/ngp/bin/python)
frame #17: Py_RunMain + 0x1bc (0x56344cf2387c in /data/miniconda3/envs/ngp/bin/python)
frame #18: Py_BytesMain + 0x39 (0x56344cf23c69 in /data/miniconda3/envs/ngp/bin/python)
frame #19: __libc_start_main + 0xe7 (0x7f21f6d05c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: <unknown function> + 0x1f7427 (0x56344cec5427 in /data/miniconda3/envs/ngp/bin/python)

Did I do something wrong? Or does the code simply not support multi-GPU training?

Why is "--fp16" mandatory to the "--tcnn" and "--ff" mode?

Hi, thank you for your great implementation and progressive updates.

I wonder why "--fp16" is mandatory to the "--tcnn" and "--ff" mode?
And I found the "dtype" of parameters (eg. Hash Feature Values) in model is 'float32' when I was debuging, is it correct?
Does "--fp16" mean that the calculation in CUDA backend is operated in ‘float16’?

Compile errors

During JIT compilation I get:

ptxas fatal : Unresolved extern function '_Z3powdi'
shencoder.cu
ninja: build stopped: subcommand failed.

Setup:
Windows 11, 21H2
pytorch: 1.10.2+cu102
nvcc: 10.2.89
GPU: 2080Ti
VS2019

Thank you.

kernel_composite_weights_backward

Hi, thank you for an excellent implementation of instant-ngp in pytorch!

I notice that in kernel_composite_weights_backward function in raymarching/raymarching.cu, the gradient of sigma is calculated as:
grad_sigmas[0] = grad_weights[0] * deltas[0].

We know that weight_i = T_i * alpha_i = T_i * (1 - exp(-delta_i * sigma_i)), thus grad_sigmas_i = \sum_{j>=i} grad_weight_j * grad_weight_j_to_sigmas_i = (\sum_{j>i} - delta_i * grad_weight_j) + T_i * delta_i * exp(-delta_i * sigma_i)

So is that correct to let grad_sigmas[0] = grad_weights[0] * deltas[0]?

Compilation issue - RuntimeError: Error building extension '_hash_encoder'

Thanks for the nice work! I met the following issue when I run python train_nerf.py data/fox --workspace trial_nerf. Do you have any thoughts? Many thanks for your help!

Traceback (most recent call last):
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1723, in _run_ninja_build
    env=env)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_nerf.py", line 3, in <module>
    from nerf.network import NeRFNetwork
  File "/home/wangjk/programs/torch-ngp/nerf/network.py", line 9, in <module>
    from encoding import get_encoder
  File "/home/wangjk/programs/torch-ngp/encoding.py", line 6, in <module>
    from hashencoder import HashEncoder
  File "/home/wangjk/programs/torch-ngp/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/home/wangjk/programs/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/hashencoder/backend.py", line 12, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/2] /home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
/home/wangjk/anaconda3/envs/largesteps/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/largesteps/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/largesteps/include -isystem /home/wangjk/anaconda3/envs/largesteps/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu(26): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__half *, c10::Half)

1 error detected in the compilation of "/home/wangjk/programs/torch-ngp/hashencoder/src/hashencoder.cu".
ninja: build stopped: subcommand failed.

More info:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
>>> import torch
>>> torch.version.cuda
'11.3'
>>> torch.__version__
'1.10.0'

Question about render quality and multi-scale

image

Thanks for your great work.
I noticed that the torch-ngp did not implement multi-scale, and I guess this may be the reason for the image blur. So I would like to ask if you have any plans to achieve multi-scale?
Thanks.

how to modify the learning rate in the code

hi, thank you very much for your work.

I want to change the learning rate, but I did not find where to modify it. Could you please help me point it out.

Kind regards

han

results on real sences

Hi, thanks for your great work. i have a little question about the results on real sences. Did you test the network on some real sences for example DTU or Tanks and Temples datasets?
looking forward to your reply, thanks a lot!

ff and cuda_ray flag

Nice job! But what does the flag 'ff' and 'cuda_ray' mean ?
Does ff mean an implenment of tcnn yourself ?
What about cuda_ray ? Sampling ray with CUDA ops?

Not a prime number in the hashencoder

constexpr uint32_t primes[7] = { 1, 19349663, 83492791, 25165843, 6291469, 12582917, 3145739 };

One of these numbers is not a prime. This was already fixed in the tiny-cuda-nn repository. And if I remember well the original paper about this hash encoding technique says that the prime numbers used should be big numbers (do not ask me to define big number :D).

BTW: great job you've done here!

How much progress in the past few weeks?

Hi, I am interested in INGP paper, but have been struggling with the official code which is c++ based.
Therefore I was glad that there existed a Pytorch implementation as you have provided,
but a while ago I couldn't help sticking with the official one, because yours seemed to have issues with its performance.

So I wonder how much you have narrowed the gap between your work and the official one, in terms of PSNR and training/inference time.

Your work contributes a lot, thanks!

Cuda raymarching

Hi! Thanks a lot for the great work.

I am trying to adapt this nerf implementation to the nerf in the wild idea where a specific embedding is learnt for every training image.
In order to do this, during the rendering process, i need to associate each xyzs point fed to the mlp to its oroginal image id.

With the no-cuda rendering this is very easy because there is a constant sampling over the ray, but with the cuda ray marching i do not know how to keep track of this.

In particular, looking a this line:

xyzs, dirs, deltas, rays = raymarching.march_rays_train(rays_o, rays_d, self.bound, self.density_grid, self.mean_density, self.iter_density, counter, self.mean_count, perturb, 128, False)

Is there a way to know for each of the xyz point to which ray it belongs?

Thanks!

significantly downgraded performance when enabling `--ff` and `--cuda_ray`

Thanks for creating this great work! But, however, I found that the performance will be significantly downgraded if we enable --ff or --cuda_ray. Here are some testing results.

CUDA_VISIBLE_DEVICES=0 python main_nerf.py data/fox --workspace trial_nerf
==> 16 ~ 17 it/s

CUDA_VISIBLE_DEVICES=0 python main_nerf.py data/fox --workspace trial_nerf --cuda_ray
==> 12 ~ 13 it/s

CUDA_VISIBLE_DEVICES=0 python main_nerf.py data/fox --workspace trial_nerf --fp16
==> 26 ~ 27 it/s

CUDA_VISIBLE_DEVICES=0 python main_nerf.py data/fox --workspace trial_nerf --fp16  --cuda_ray
==> 21 ~ 22 it/s

CUDA_VISIBLE_DEVICES=0 python main_nerf.py data/fox --workspace trial_nerf --fp16  --cuda_ray --ff
==> 9 ~ 10 it/s

My environment is torch=1.8.0 and cuda=11.2 with a RTX-3090 card.

I originally expect that the performance will be futher improved if I enable these options, but actually it doesn't.
Could you give me some suggestions to improve this strange phenomenon? Thank you!

Strange performance when using --tcnn mode

Hi, have you reproduced the PSNR performance in "--tcnn" mode since the latest commit?

In my experiment of "Lego", the training&test PSNR in "--tcnn" mode is quite strange, only 23+ dB , while the PSNR in "--ff --fp16" mode is about 28+dB.

My command line for training in "--tcnn" mode is python main_nerf.py data/nerf_synthetic/lego --workspace exp/lego_tcnn --tcnn --mode blender --bound 1 --scale 0.8
Did I miss something in the command line?

I have some questions that I hope will be answered

Here is my environment configuration
GPU number is 2
GPU = NVIDIA GeForce RTX 2080 Ti
cuda is available ? True
torch.version.cuda = 10.2
torch.version = 1.11.0+cu102

But I am getting strange exception when running

File "main_nerf.py", line 55, in <module>
from nerf.network import NeRFNetwork
File "/home/jiangpeng/Work/torch-ngp/nerf/network.py", line 5, in <module>
from encoding import get_encoder
File "/home/jiangpeng/Work/torch-ngp/encoding.py", line 6, in <module>
from shencoder import SHEncoder
File "/home/jiangpeng/Work/torch-ngp/shencoder/__init__.py", line 1, in <module>
from .sphere_harmonics import SHEncoder
File "/home/jiangpeng/Work/torch-ngp/shencoder/sphere_harmonics.py", line 9, in <module>
from .backend import _backend
File "/home/jiangpeng/Work/torch-ngp/shencoder/backend.py", line 9, in <module>
sources=[os.path.join(_src_path, 'src', f) for f in [
File "/home/jiangpeng/anaconda3/envs/pytorch1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1156, in load
keep_intermediates=keep_intermediates)
File "/home/jiangpeng/anaconda3/envs/pytorch1/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1371, in _jit_compile
baton.wait()
File "/home/jiangpeng/anaconda3/envs/pytorch1/lib/python3.7/site-packages/torch/utils/file_baton.py", line 42, in wait
time.sleep(self.wait_seconds)

Why does the program go to sleep

Converting rendered depth to 3D point cloud in world coordinates

Hi @ashawkey, thanks for the great project!

How would you plot a 3D point cloud in world coordinates using the rendered depth? I saw you've a function plot_pointcloud() but that looks like it's plotting in 'torch-ngp coordinates'.

I believe I need to string together nerf_matrix_to_ngp, rays_o, rays_d and depth (from render()) but I'm confused with how the different coordinate systems interact. I'd really appreciate some guidance on plotting 3D point clouds in world coordinates!

Worse performance by using --tcnn

I run the fox dataset, by using and not using tcnn option for 200 epochs. It seems that using tcnn has worse performance.
The commands are

python train_nerf.py data/fox/transforms.json --workspace trial_nerf --fp16
python train_nerf.py data/fox/transforms.json --workspace trial_nerf --fp16 --tcnn

comp

Did I mess up something? Thank you very much!

How can i apply smoothstep function in this repo

@ashawkey Thanks for your great work!
I see that readme said this repo only implement the linear interpolation mode ,and A low-cost smoothstep function is provided in Appendix A of the paper.
If I want to apply it in this repo, where do I need to add it, is it here hashencoder.cu ?

Adding metrics

Hi!

I'm trying to add some metrics into the code. Could you please give a clue about how to realize that?

For me, it always has some bug saying there is no attribute "clear" etc in my function object.

Sincerely

Larger num_levels is not supported

Hi @ashawkey , Thanks for the awesome work!

It seems I cannot use larger num_levels (e.g., num_levels = 32). There are some inf/-inf after the hash encoding. Do you have any ideas (might be relevant to fast hash cuda operation)?

Also, do you have any plans to implement the aabb_scale occupancy grid (to model larger unbounded scene, e.g., fox) as the original instant-ngp implementation?

Ray Generation different?

Hello, torch-ngp is really great work. I really appreciate it!
It seems that in the origin instant-ngp, for the rays in a batch, they come from different images not from a single image. Also, the number of rays in a batch is not always 4096. It will increase. The origin instant-ngp seems to use image_cdf and error map to help the ray sampling. I think adding this part may increase the performance and speed of torch-ngp. Any plan for this? Thank you very much!

Nerf-synthetic and LLFF training

Hello,

Thanks for the great code! I have tried your code with fox scene and it works perfectly. However, I receive this error for the chair scene of NeRF. It seems like in the transforms_train file I need to declare a bunch of additional stuff that I found in the transforms.json of the fox scene. How did you use your code to train on NeRF and LLFF datasets?

python gui_nerf.py data/nerf_synthetic/chair/transforms_train.json --workspace trial_nerf --fp16 --ff --cuda_ray --train
Traceback (most recent call last):
  File "gui_nerf.py", line 361, in <module>
    train_dataset = NeRFDataset(opt.path, 'train', radius=opt.radius)
  File "/home/phong/data/Work/Paper_journal/torch-ngp/nerf/provider.py", line 47, in __init__
    self.H = int(transform['h']) // downscale
KeyError: 'h'

image

CUDA error when using hash encoder

Hi, I add your hash encoder as a part of my network code(I only use the torch-ngp/hashencoder folder), and the compile seems not reporting any bugs, the hash encoder also outputs correctly, but the nn.Linear in my code following the hash encoder met CUDA error that won't show up without the hash encoder

 File "/home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378083779/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f4037e2b2f2 in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f4037e2867b in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f4038084219 in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f4037e133a4 in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e0dda (0x7f408ed87dda in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e0e71 (0x7f408ed87e71 in /home/zz/miniconda3/envs/nerv/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1aa06a (0x555fc794b06a in /home/zz/miniconda3/envs/nerv/bin/python)
frame #7: <unknown function> + 0x10d0fc (0x555fc78ae0fc in /home/zz/miniconda3/envs/nerv/bin/python)
frame #8: <unknown function> + 0x10fbcc (0x555fc78b0bcc in /home/zz/miniconda3/envs/nerv/bin/python)
frame #9: <unknown function> + 0x217ca3 (0x555fc79b8ca3 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #10: <unknown function> + 0x10fd05 (0x555fc78b0d05 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #11: <unknown function> + 0x1aa047 (0x555fc794b047 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #12: <unknown function> + 0x10d0fc (0x555fc78ae0fc in /home/zz/miniconda3/envs/nerv/bin/python)
frame #13: <unknown function> + 0x10fbcc (0x555fc78b0bcc in /home/zz/miniconda3/envs/nerv/bin/python)
frame #14: <unknown function> + 0x217ca3 (0x555fc79b8ca3 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #15: <unknown function> + 0x10fd35 (0x555fc78b0d35 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #16: <unknown function> + 0x1aa047 (0x555fc794b047 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #17: <unknown function> + 0x110882 (0x555fc78b1882 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #18: <unknown function> + 0x1102a9 (0x555fc78b12a9 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #19: <unknown function> + 0x110293 (0x555fc78b1293 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #20: <unknown function> + 0x177d87 (0x555fc7918d87 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #21: PyDict_SetItemString + 0x4c (0x555fc791bccc in /home/zz/miniconda3/envs/nerv/bin/python)
frame #22: PyImport_Cleanup + 0xac (0x555fc798cc2c in /home/zz/miniconda3/envs/nerv/bin/python)
frame #23: Py_FinalizeEx + 0x79 (0x555fc79f34f9 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #24: Py_RunMain + 0x1bc (0x555fc79f687c in /home/zz/miniconda3/envs/nerv/bin/python)
frame #25: Py_BytesMain + 0x39 (0x555fc79f6c69 in /home/zz/miniconda3/envs/nerv/bin/python)
frame #26: __libc_start_main + 0xe7 (0x7f40c0bccbf7 in /lib/x86_64-linux-gnu/libc.so.6)
frame #27: <unknown function> + 0x1f7427 (0x555fc7998427 in /home/zz/miniconda3/envs/nerv/bin/python)

Maybe you could help me with this? Thanks a lot

problem when enable "cuda_ray" for rendering

I met the problem when I enable "--cuda_ray" for rendering as follows.

C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here
C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here
.....
2 errors detected in the compilation of "D:/torch-ngp/raymarching/src/raymarching.cu".
raymarching.cu
ninja: build stopped: subcommand failed.

The problem seems to be from the pybind11. Can someone help me out? Thanks in advance.

problem when run the code

When I run

python main_nerf.py data/fox --workspace trial_nerf --fp16 --ff --cuda_ray --gui

I got the following error,

Traceback (most recent call last):
File "C:\Users\zhisong_liu\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1740, in _run_ninja_build
subprocess.run(
File "C:\Users\zhisong_liu\Miniconda3\envs\ngp\lib\subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

I tried to fix the problem on the '['ninja', '-v']' by '['ninja', '--version']', but I got another problem as,

Traceback (most recent call last):
File "main_nerf.py", line 53, in
from nerf.network import NeRFNetwork
File "D:\torch-ngp\nerf\network.py", line 5, in
from encoding import get_encoder
File "D:\torch-ngp\encoding.py", line 5, in
from hashencoder import HashEncoder
File "D:\torch-ngp\hashencoder_init_.py", line 1, in
from .hashgrid import HashEncoder
File "D:\torch-ngp\hashencoder\hashgrid.py", line 9, in
from .backend import _backend
File "D:\torch-ngp\hashencoder\backend.py", line 6, in
_backend = load(name='_hash_encoder',
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1144, in load
return _jit_compile(
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1382, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "C:\Users\Miniconda3\envs\ngp\lib\site-packages\torch\utils\cpp_extension.py", line 1775, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
ImportError: DLL load failed while importing _hash_encoder: The specified module could not be found.

What should I do?

Originally posted by @Holmes-Alan in #26 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.