RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient about dcnv2 HOT 11 CLOSED

charlesshang commented on September 13, 2024 2

RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient

from dcnv2.

Comments (11)

CharlesShang commented on September 13, 2024 2

Hi,
you might need to look at #known-issues.
If you want to do gradcheck, better do it with double type. I have done it before (check out previous commits).

from dcnv2.

commented on September 13, 2024

some times I got this error message:

xxxxx@DeskPC:~/Desktop/DCNv2$ sudo python3 test.py 
torch.Size([2, 64, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
0.971507, 1.943014
0.971507, 1.943014
Zero offset passed
/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py:170: UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point. 
  'At least one of the inputs that requires gradient '
check_gradient_dpooling: True
Traceback (most recent call last):
  File "test.py", line 265, in <module>
    check_gradient_dconv()
  File "test.py", line 97, in check_gradient_dconv
    eps=1e-3, atol=1e-4, rtol=1e-2))
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 205, in gradcheck
    'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 185, in fail_test
    raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 1,
numerical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000, -0.0011,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]])
analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000, -0.0011,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]])

from dcnv2.

DKandrew commented on September 13, 2024

Same issue here. It seems like it did not pass through its own test?
Is it because of this?

UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point.

from dcnv2.

Simpatech-app commented on September 13, 2024

Did any one find the reason? I am getting exactly the same error.

from dcnv2.

fengzifrank commented on September 13, 2024

me too，How to deal with this error

from dcnv2.

shadyatscu commented on September 13, 2024

the same error occured

from dcnv2.

SCoulY commented on September 13, 2024

the same error occured

from dcnv2.

cwjhx commented on September 13, 2024

some times I got this error message:

xxxxx@DeskPC:~/Desktop/DCNv2$ sudo python3 test.py 
torch.Size([2, 64, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
0.971507, 1.943014
0.971507, 1.943014
Zero offset passed
/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py:170: UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point. 
  'At least one of the inputs that requires gradient '
check_gradient_dpooling: True
Traceback (most recent call last):
  File "test.py", line 265, in <module>
    check_gradient_dconv()
  File "test.py", line 97, in check_gradient_dconv
    eps=1e-3, atol=1e-4, rtol=1e-2))
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 205, in gradcheck
    'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 185, in fail_test
    raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 1,
numerical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000, -0.0011,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]])
analytical:tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000, -0.0011,  ...,  0.0000,  0.0000,  0.0000],
        ...,
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
        [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]])

the same error occured

from dcnv2.

iraadit commented on September 13, 2024

Got the same error

from dcnv2.

hangg7 commented on September 13, 2024

For those who has also run into this issue (not specifically to DCNv2 impl), this behavior is expected due to the non-determinism of atomicAdd. You probably want to checkout this commit and play with nondet_tol when doing gradcheck.

from dcnv2.

qiangruoyu commented on September 13, 2024

HI, I encountered this problem, could anyone help me out here, thks :-)

xxzx@DeskPC:~/Desktop/DCNv2$ python3 test.py 
torch.Size([2, 64, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
0.971507, 1.943014
0.971507, 1.943014
Zero offset passed
/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py:170: UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point. 
  'At least one of the inputs that requires gradient '
check_gradient_dpooling: True
Traceback (most recent call last):
  File "test.py", line 265, in <module>
    check_gradient_dconv()
  File "test.py", line 97, in check_gradient_dconv
    eps=1e-3, atol=1e-4, rtol=1e-2))
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 208, in gradcheck
    return fail_test('Backward is not reentrant, i.e., running backward with same '
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/gradcheck.py", line 185, in fail_test
    raise RuntimeError(msg)
RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient

can you solve it?
how?
I meet the same error......

from dcnv2.

RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient about dcnv2 HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent