When I do <div class="snippet-clipboard-content notranslate position-relative over

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Thanks for the feedback! Though it's weird, this is likely because <code class="notran

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes, indeed if there are <4 training images, <code class="notranslate"

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

CUDA error: device-side assert triggered about neuralangelo HOT 11 CLOSED

nvlabs commented on August 16, 2024

CUDA error: device-side assert triggered

from neuralangelo.

Comments (11)

SenVy-WH commented on August 16, 2024 1

@chenhsuanlin
I've checked the shape again.
the appear_embed is Embedding(2,8), but the sample_idx is tensor([2,3]).
So I changed the num_images from 2 to 4 in toy_example.yaml and then it can run successfully.

But idk why the num is incorrect.
I followed the instructions in Data Preparation step-by-step and generated the toy_example.yaml file correctly without running into any issues along the way.

from neuralangelo.

chenhsuanlin commented on August 16, 2024

Hi @SenVy-WH, could you run with the environment variable CUDA_LAUNCH_BLOCKING=1 as suggested in the log, and help pin down which line in the Pytorch code it fails at? Thanks!

from neuralangelo.

SenVy-WH commented on August 16, 2024

Hi @SenVy-WH, could you run with the environment variable CUDA_LAUNCH_BLOCKING=1 as suggested in the log, and help pin down which line in the Pytorch code it fails at? Thanks!

Sorry, I accidentally closed this issue by mistake. Reopening it now.
I've tried the CUDA_LAUNCH_BLOCKING=1 but nothing changed & happend.

But I got this when I tried debug it step by step.

Evaluating:  50%|██████████████████████████████████████████████████████                                                      | 1/2 [00:23<00:23, 23.53s/it]
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680542704550/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [0,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

It happened in model.py on line 220

app = self.appear_embed(sample_idx)[:, None, None]  # [B,1,1,C]

Could the data preparation cause this problem or have I configured the image parameters incorrectly?

from neuralangelo.

chenhsuanlin commented on August 16, 2024

Thanks for the feedback! Though it's weird, this is likely because num_images in our example was set to too small (2). We are working on replacing the toy example with a workable example -- will push the update soon!

from neuralangelo.

chenhsuanlin commented on August 16, 2024

@SenVy-WH we have just updated the toy example video. Could you also try pulling and following the new instructions? If you still run into further issues, please let me know. Thanks!

from neuralangelo.

GondorFu commented on August 16, 2024

@SenVy-WH we have just updated the toy example video. Could you also try pulling and following the new instructions? If you still run into further issues, please let me know. Thanks!

the colmap still can only register two images which is leading to the same issue

from neuralangelo.

SenVy-WH commented on August 16, 2024

@chenhsuanlin Thanks for being so responsive! I've tried the new video but still get the same issue as @GondorFu said.

btw, I found the output always has

Train dataset length: 2                                                                                                                                                                                             
Val dataset length: 4

so i change the subset from 4 to 2 in the base.yaml and it can run successfully.
maybe this num should be sync with num_images: 2 in lego.yaml?

from neuralangelo.

chenhsuanlin commented on August 16, 2024

Yes, indeed if there are <4 training images, data.val.subset should be adjusted as well (though these cases should be very rare!). Marking this as a bug to fix.

It seems weird that COLMAP only finds two images on the new toy example though; ideally it should be able to extract 100 frames. I will look into this.

from neuralangelo.

SenVy-WH commented on August 16, 2024

Yes, indeed if there are <4 training images, data.val.subset should be adjusted as well (though these cases should be very rare!). Marking this as a bug to fix.

It seems weird that COLMAP only finds two images on the new toy example though; ideally it should be able to extract 100 frames. I will look into this.

Hi @chenhsuanlin, I've tried using COLMAP GUI (rather than scripts) to change some settings and keep other settings as default to extract feature and here is what I found:

With camera_model=RADIAL & not selected "Shared for all image", COLMAP only finds two images as the scripts.
With camera_model=RADIAL & selected "Shared for all image", COLMAP can finds 100 images SOMETIMES.
With camera_model=SAMPLE_RADIAL & selected "Shared for all image", COLMAP can finds 100 images ALWAYS.
in case 3, I got

Train dataset length: 100                                                                                                                                                                                            
Val dataset length: 1

So I think this two options may be the key to this issue.

from neuralangelo.

chenhsuanlin commented on August 16, 2024

Thanks @SenVy-WH for the feedback! It seems that complex camera model could somewhat hurt registration. We will look into this.

from neuralangelo.

chenhsuanlin commented on August 16, 2024

The num_images problem has been fixed (4b2e18e) and the COLMAP script now defaults to the SIMPLE_RADIAL camera model (a4e5690). Thanks @SenVy-WH for the feedback -- please feel free to reopen if there are further issues.

from neuralangelo.

CUDA error: device-side assert triggered about neuralangelo HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent