I successfully run the demo. When the program exit, I got <code class="notranslate

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I create a minimal source file which can reproduce the error: <div class="highligh

everything is good but segmentation fault after program exit about mlcpp HOT 4 CLOSED

kolkir commented on July 21, 2024

everything is good but segmentation fault after program exit

from mlcpp.

Comments (4)

huangenyan commented on July 21, 2024

more on this: problem only happens on GPU, if I use CPU (set gpu_count = 0 in config) the problem is gone.

from mlcpp.

Kolkir commented on July 21, 2024

@huangenyan Hello, could please provide more details: the system do you use, the image you used for evaluation, the type of your GPU, the dump was generated , what type of build (with optimization or not) do you used ...
I don't have such problem in my environment.

from mlcpp.

huangenyan commented on July 21, 2024

I just forget to mention what I use is mask_rcnn_pytorch
I spend some time on this and here are some information you may find helpful:

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

$ nvidia-smi
Thu Aug 29 15:51:46 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40       Driver Version: 430.40       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0  On |                  N/A |
| 41%   41C    P2    56W / 260W |   6585MiB / 11016MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1223      G   /usr/lib/xorg/Xorg                           188MiB |
|    0      1464      G   /usr/bin/gnome-shell                         113MiB |
|    0     11545      G   ...quest-channel-token=5961631727014844578   235MiB |
|    0     15818      C   /usr/bin/valgrind.bin                       5890MiB |
|    0     31699      G   ...uest-channel-token=15778414260646414614   153MiB |
+-----------------------------------------------------------------------------+

GPU is RTX 2080 TI

$ valgrind -v ./mask-rcnn_demo
...
==15818== 1 errors in context 1 of 1:
==15818== Invalid read of size 4
==15818==    at 0x44BC09FE: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BC596A: ??? (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x44BDABE1: cudaDeviceSynchronize (in /usr/local/cuda-10.0/lib64/libcudart.so.10.0.130)
==15818==    by 0x14E26393: cudnnDestroy (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x109A4CF0: std::unordered_map<int, at::native::(anonymous namespace)::Handle, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, at::native::(anonymous namespace)::Handle> > >::~unordered_map() (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x447F5614: __cxa_finalize (cxa_finalize.c:83)
==15818==    by 0x107B2FB2: ??? (in /usr/local/lib/libcaffe2_gpu.so)
==15818==    by 0x4010B72: _dl_fini (dl-fini.c:138)
==15818==    by 0x447F5040: __run_exit_handlers (exit.c:108)
==15818==    by 0x447F5139: exit (exit.c:139)
==15818==    by 0x447D3B9D: (below main) (libc-start.c:344)
==15818==  Address 0x18 is not stack'd, malloc'd or (recently) free'd
==15818== 
--15818-- 
--15818-- used_suppression:  98231 zlib-1.2.x trickyness (1b): See http://www.zlib.net/zlib_faq.html#faq36 /usr/lib/valgrind/default.supp:516
==15818== 
==15818== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 98231 from 1)

I also tested which statement cause the segmentation fault, and find problem occurs in fpn.cpp, by adding exit(0) at different location of the program:

std::tuple<torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor,
           torch::Tensor>
FPNImpl::forward(at::Tensor x) {
// no segmentation fault if adding exit(0) here
  x = c1_->forward(x);
// segmentation fault if adding exit(0) here
  x = c2_->forward(x);
  auto c2_out = x;
  x = c3_->forward(x);
  auto c3_out = x;
  x = c4_->forward(x);
  auto c4_out = x;
  x = c5_->forward(x);
  auto p5_out = p5_conv1_->forward(x);
  auto p4_out =
      p4_conv1_->forward(c4_out) + upsample(p5_out, /*scale_factor*/ 2);
  auto p3_out =
      p3_conv1_->forward(c3_out) + upsample(p4_out, /*scale_factor*/ 2);
  auto p2_out =
      p2_conv1_->forward(c2_out) + upsample(p3_out, /*scale_factor*/ 2);

  p5_out = p5_conv2_->forward(p5_out);
  p4_out = p4_conv2_->forward(p4_out);
  p3_out = p3_conv2_->forward(p3_out);
  p2_out = p2_conv2_->forward(p2_out);

  // P6 is used for the 5th anchor scale in RPN. Generated by subsampling from
  // P5 with stride of 2.
  auto p6_out = p6_->forward(p5_out);

  return {p2_out, p3_out, p4_out, p5_out, p6_out};
}

I'm still working on this and hopefully providing more information.

from mlcpp.

huangenyan commented on July 21, 2024

I create a minimal source file which can reproduce the error:

#include <torch/torch.h>

#include <iostream>
#include <memory>


int main(int argc, char** argv) {

  auto input = torch::ones({1, 3, 1024, 1024});
  input = input.to(torch::DeviceType::CUDA);

  auto c2 = torch::nn::Conv2d(torch::nn::Conv2dOptions(3, 64, 7).stride(2).padding(3));
  c2->to(torch::DeviceType::CUDA);
  c2->forward(input);
  return 0;
}

The example has nothing to do with your code, so I think it is a bug in pytorch c++ frontend and I'll report an issue there.

Thanks!

from mlcpp.

everything is good but segmentation fault after program exit about mlcpp HOT 4 CLOSED

Comments (4)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent