pytorch / extension-cpp Goto Github PK
View Code? Open in Web Editor NEWC++ extensions in PyTorch
C++ extensions in PyTorch
Hello and thanks for this great repository!
Can you give me a hint on how to implement default arguments? I tried the following so far but it does not work:
at::Tensor torch_func(at::Tensor tensor1, at::Tensor tensor2={}) {
...
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("torch_func", &torch_func, py::arg("tensor1"), py::arg("tensor2")={});
}
torch_func(tensor1, None)
I would be very grateful for help.
Hello, I implement a custom cpp file, successfully compile it but when trying to import it via
import torch
import grid_sampler_cuda
, I encounter the following error:
ImportError: /home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-pa
ckages/grid_sampler_cuda-0.0.0-py3.7-linux-x86_64.egg/grid_sampler_cuda.cpython-
37m-x86_64-linux-gnu.so: undefined symbol: _Z23my_grid_sampler_2d_cudaRKN2at6Ten
sorES2_b
My torch._C._GLIBCXX_USE_CXX11_ABI
is False
and my complier's output:
gcc -pthread -B /home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/compiler_compat -W
l,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -
fPIC -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages
/torch/include -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/sit
e-packages/torch/include/torch/csrc/api/include -I/home/shuqin/anaconda3/envs/py
torch_1.4_py3.7/lib/python3.7/site-packages/torch/include/TH -I/home/shuqin/anac
onda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages/torch/include/THC -I/us
r/local/cuda-10.0/include -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/includ
e/python3.7m -c grid_sampler_cuda.cpp -o build/temp.linux-x86_64-3.7/grid_sample
r_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=grid_sampler_cud
a -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC
but not for C++
/usr/local/cuda-10.0/bin/nvcc -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/li
b/python3.7/site-packages/torch/include -I/home/shuqin/anaconda3/envs/pytorch_1.
4_py3.7/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home
/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages/torch/inclu
de/TH -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-package
s/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/shuqin/anaconda3/envs
/pytorch_1.4_py3.7/include/python3.7m -c grid_sampler_cuda_kernel.cu -o build/te
mp.linux-x86_64-3.7/grid_sampler_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__
CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexp
r --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_N
AME=grid_sampler_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=s
m_75 -std=c++11
I browse some relative issues and many of them suggest adding -D_GLIBCXX_USE_CXX11_ABI=0
to compiler, however this is already satisfied in my case. My pytorch version is 1.4.0, installed via conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
. After installing pytorch, I might have upgrade my gcc version to 7.5.0
, and install horovod
which also install a gxx_linux-64
via conda install
gxx_linux-64`.
Could somenbody please help me on this?
I failed to compile the cuda code: python setup.py install
and I'm rather surprised that this issues has not been brought up before. Here's the error message:
/usr/local/cuda/bin/nvcc -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/home/maxjiang/software/anaconda3/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options '-fPIC' -std=c++11
lltm_cuda_kernel.cu(54): error: calling a __host__ function("std::fmax<double, float> ") from a __global__ function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel<float> ") is not allowed
lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code
2 errors detected in the compilation of "/tmp/tmpxft_00003be3_00000000-6_lltm_cuda_kernel.cpp1.ii".
Most of this is probably irrelevant except for gcc version:
Here's my hacky fix that worked, by simply wrapping scalar_t around the doubles. Not sure this is the most elegant solution.
lltm_cuda_kernel.cu
lines 26-29:
template <typename scalar_t>
__device__ __forceinline__ scalar_t elu(scalar_t z, scalar_t alpha = 1.0) {
return fmax(scalar_t(0.0), z) + fmin(scalar_t(0.0), alpha * (exp(z) - scalar_t(1.0)));
}
OS: 16.04
PyTorch version: 0.4.1
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6
CUDA/cuDNN version: 9.0
GPU models and configuration: Tesla K80
I define my own custom opt:
// sigmoid_cuda_kernal.cu
namespace {
template <typename scalar_t>
__device__ __forceinline__ scalar_t sigmoid(scalar_t z) {
return 1.0 / (1.0 + exp(-z));
}
template <typename scalar_t>
__device__ __forceinline__ scalar_t d_sigmoid(scalar_t z) {
return (1.0 - z) * z;
}
template <typename scalar_t>
__global__ void sigmoid_cuda_forward_kernel(
const scalar_t* __restrict__ input,
scalar_t* __restrict__ output) {
const int index = blockIdx.x * blockDim.x + blockIdx.y;
output[index] = sigmoid(input[index]);
}
template <typename scalar_t>
__global__ void sigmoid_cuda_backward_kernel(
const scalar_t* __restrict__ grad_output,
const scalar_t* __restrict__ output,
scalar_t* __restrict__ new_grad_output) {
const int index = blockIdx.x * blockDim.x + blockIdx.y;
new_grad_output[index] = d_sigmoid(output[index] * grad_output[index]);
}
} // namespace
at::Tensor sigmoid_cuda_forward(
at::Tensor input) {
auto output = at::zeros_like(input);
const dim3 blocks(input.size(0), input.size(1));
const int threads = 1;
AT_DISPATCH_FLOATING_TYPES(input.type(), "sigmoid_forward_cuda", ([&] {
sigmoid_cuda_forward_kernel<scalar_t><<<blocks, threads>>>(
input.data<scalar_t>(),
output.data<scalar_t>());
}));
return output;
}
at::Tensor sigmoid_cuda_backward(
at::Tensor grad_output,
at::Tensor output) {
auto new_grad_output = at::zeros_like(grad_output);
const dim3 blocks(grad_output.size(0), grad_output.size(1));
const int threads = 1;
AT_DISPATCH_FLOATING_TYPES(grad_output.type(), "sigmoid_backward_cuda", ([&] {
sigmoid_cuda_backward_kernel<scalar_t><<<blocks, threads>>>(
grad_output.data<scalar_t>(),
output.data<scalar_t>(),
new_grad_output.data<scalar_t>());
}));
return new_grad_output;
}
And the cpp wrapper is as follow:
// sigmoid_cuda.cpp
at::Tensor sigmoid_cuda_forward(
const at::Tensor& input);
at::Tensor sigmoid_cuda_backward(
const at::Tensor& grad_output,
const at::Tensor& output);
#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
at::Tensor sigmoid_forward(
const at::Tensor& input) {
CHECK_INPUT(input);
return sigmoid_cuda_forward(input);
}
at::Tensor sigmoid_backward(
const at::Tensor& grad_output,
const at::Tensor& output) {
CHECK_INPUT(grad_output);
CHECK_INPUT(output);
return sigmoid_cuda_backward(
grad_output,
output);
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("forward", &sigmoid_forward, "sigmoid forward (CUDA)");
m.def("backward", &sigmoid_backward, "sigmoid backward (CUDA)");
}
The compilation process is successful.
running install
running bdist_egg
running egg_info
writing sigmoid_cuda_linear_cpp.egg-info/PKG-INFO
writing dependency_links to sigmoid_cuda_linear_cpp.egg-info/dependency_links.txt
writing top-level names to sigmoid_cuda_linear_cpp.egg-info/top_level.txt
reading manifest file 'sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt'
writing manifest file 'sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/linear_cpp.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/sigmoid_cuda.py to sigmoid_cuda.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.sigmoid_cuda.cpython-36: module references __file__
creating 'dist/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
removing '/home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg' (and everything under it)
creating /home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
Extracting sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg to /home/zhangzhi/anaconda3/lib/python3.6/site-packages
sigmoid-cuda-linear-cpp 0.0.0 is already the active version in easy-install.pth
Installed /home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
Processing dependencies for sigmoid-cuda-linear-cpp==0.0.0
Finished processing dependencies for sigmoid-cuda-linear-cpp==0.0.0
But when I import it, things will go wrong.
ImportError: /home/.../anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg/sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Z20sigmoid_cuda_forwardRKN2at6TensorE
I was trying to build a CUDAExtension, where the extension name was containing dots. The reason for that is that I wanted to install it as a submodule of the module I am building. For example, I have a top-level module foo and I want my CUDA Extension to be foo.bar.
There is a related discussion also here.
This is currently not possible using CUDAExtension and I can't think of a workaround right now. The obvious reason here is that the macros don't allow for '.' characters. I am attaching the output of the build process. A fast way to replicate this is to change the module name from lltm_cuda to foo.cuda in your CUDA lltm example.
In file included from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/pytypes.h:12:0,
from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/cast.h:13,
from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/attr.h:13,
from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/pybind11.h:43,
from /usr/local/lib/python3.5/dist-packages/torch/lib/include/torch/torch.h:6,
from neural_renderer/cuda/load_textures_cuda.cpp:1:
<command-line>:0:37: error: expected initializer before '.' token
/usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/detail/common.h:212:47: note: in definition of macro 'PYBIND11_CONCAT'
#define PYBIND11_CONCAT(first, second) first##second
^
neural_renderer/cuda/load_textures_cuda.cpp:33:1: note: in expansion of macro 'PYBIND11_MODULE'
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
neural_renderer/cuda/load_textures_cuda.cpp:33:17: note: in expansion of macro 'TORCH_EXTENSION_NAME'
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
<command-line>:0:37: error: expected initializer before '.' token
/usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/detail/common.h:171:51: note: in definition of macro 'PYBIND11_PLUGIN_IMPL'
extern "C" PYBIND11_EXPORT PyObject *PyInit_##name()
^
neural_renderer/cuda/load_textures_cuda.cpp:33:1: note: in expansion of macro 'PYBIND11_MODULE'
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
neural_renderer/cuda/load_textures_cuda.cpp:33:17: note: in expansion of macro 'TORCH_EXTENSION_NAME'
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
It would be nice if there were more examples and especially examples with different types of inputs. Currently, the only input type is at::Tensor
, but what about the other types? Specifically, I had a lot of trouble using at::Scalar
and at::IntList
as an input and instead used int
and std::vector<int>
. This might be an issue with pybind11 rather than pytorch extensions specifically, but for users with limited knowledge of pybind11 and aten, examples can be very helpful.
This tutorials helped me to write a single layer unit with CUDA code.
But how to write CUDA code of the multilayer units, like torch/nn/_functions/rnn.py 281?
output, hy, cy, reserve, new_weight_buf = torch._cudnn_rnn(
input, weight_arr, weight_stride0,
flat_weight,
hx, cx,
mode, hidden_size, num_layers,
batch_first, dropout, train, bool(bidirectional),
list(batch_sizes.data) if variable_length else (),
dropout_ts)
I have achieved the same results by using the template of AutogradRNN, i.e., torch/nn/_functions/rnn.py 212.
def AutogradRNN(mode, input_size, hidden_size, num_layers=1, batch_first=False,
dropout=0, train=True, bidirectional=False, variable_length=False,
dropout_state=None, flat_weight=None):
But gpu utilization was too low and speed was too slow. Perhaps because each single layer unit is called individually, which involve launch of a CUDA kernel. So I want to rewrite multilayer units in CUDA and fuse particular groups of single layer. Can you provide a boilerplate?
I am running it on google colab and python grad_check.py cuda is not passing successfully, others (py, cpp) are passing with no issues.
This is an issue regarding pytorch master and not 0.4.0 , so I don't put it directly in a PR, but might be good to know this
This line (and the next) make the compilation fail when using master build (and nightly builds I believe) because the function definition has changed.
Now there's apparently a lot of refactoring going on in pytorch right now so it may change again in the future but the correct way to call this line is now
#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, " must be contiguous")
see this commit for explanation.
My environment is as follows:
win10
pytorch1.2.0
cuda10.0
Get the following error. How can I solve this problem:
Connected to pydev debugger (build 182.4323.49)
Using C:\Users\86969\AppData\Local\Temp\torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py:189: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the specified file。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Emitting ninja build file C:\Users\86969\AppData\Local\Temp\torch_extensions\lltm_cuda\build.ninja...
��Ϣ: ���ṩ��ģʽ���ҵ��ļ���
Traceback (most recent call last):
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1664, in
main()
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "F:/git/extension-cpp/benchmark.py", line 28, in
from cuda.lltm import LLTM
File "F:\git\extension-cpp\cuda\lltm.py", line 6, in
from cuda.jit import lltm_cuda
File "F:\git\extension-cpp\cuda\jit.py", line 3, in
'lltm_cuda', ['F:\git\extension-cpp\cuda\lltm_cuda.cpp', 'F:\git\extension-cpp\cuda\lltm_cuda_kernel.cu'], verbose=True)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 658, in load
is_python_module)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 827, in _jit_compile
with_cuda=with_cuda)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 876, in _write_ninja_file_and_build
with_cuda=with_cuda)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 1089, in _write_ninja_file
'cl']).decode().split('\r\n')
File "F:\Anaconda3\envs\pytorch1.2.0\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "F:\Anaconda3\envs\pytorch1.2.0\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.
I tried to run the cuda version on pytorch 0.4 but it said:
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include/TH -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/include/python3.7m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.7/lltm_cuda.o -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
lltm_cuda.cpp:1:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1
Do you have any idea how can I resolve this issue? (If I update to pytorch 1.xx it would be OK but I need 0.4x version)
I work on ubuntu16.04, cuda9.0 and Pytorch1.0.
When I run your example code cuda/setup.py
:
Traceback (most recent call last):
File "setup.py", line 9, in <module>
'lltm_cuda_kernel.cu',
File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 476, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 549, in library_paths
paths.append(_join_cuda_home(lib_dir))
File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 1121, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
EnvironmentError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
However, I am sure cuda9.0 in my computer is installed correctly.
How to fix this problem?
We like to limit our issues to bug reports and feature requests. If you have a question or would like help and support, please visit our forums: https://discuss.pytorch.org/
If you are submitting a feature request, please preface the title with [feature request].
When submitting a bug report, please include the following information (where relevant):
Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/lltm_cuda/build.ninja...
Building extension module lltm_cuda...
[1/3] /usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda_kernel.cu -o lltm_cuda_kernel.cuda.o
FAILED: lltm_cuda_kernel.cuda.o
/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda_kernel.cu -o lltm_cuda_kernel.cuda.o
/bin/sh: 1: /usr/local/cuda-9.0/bin/nvcc: not found
[2/3] c++ -MMD -MF lltm_cuda.o.d -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp -o lltm_cuda.o
FAILED: lltm_cuda.o
c++ -MMD -MF lltm_cuda.o.d -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp -o lltm_cuda.o
/home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp:78:16: error: expected constructor, destructor, or type conversion before ‘(’ token
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
ninja: build stopped: subcommand failed.
I installed CUDA 10.1, why building lltm_cuda_kernel.cu automatically uses CUDA 9.0?
Hi, I tried to build the extension using your sc. I managed to build from cpp/setup.py and was able to import it. However, I just cannot build from cuda/setup.py. I tried the solution mentioned in other posts i.e. fmax to ::fmax, casting etc, none of them worked.
The error message:
E:/Miniconda/envs/pointnetpy/lib/site-packages/torch/include\torch/csrc/jit/argument_spec.h(161): error: member "torch::jit::ArgumentSpecCreator::DEPTH_LIMIT" may not be initialized
1 error detected in the compilation of "C:/Users/XIEYUA~1/AppData/Local/Temp/tmpxft_00002f00_00000000-10_lltm_cuda_kernel.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.exe' failed with exit status 1
Any thoughts or suggestions would be greatly appreciated!
p.s.
FYI, I failed to compile using jit.py for both cpp and cuda version.
The error log:
Using C:\Users\XIEYUA1\AppData\Local\Temp\torch_extensions as PyTorch extensions root...1\AppData\Local\Temp\torch_extensions\lltm_cpp\build.ninja...
Emitting ninja build file C:\Users\XIEYUA
Building extension module lltm_cpp...
[1/2] cl /showIncludes -DTORCH_EXTENSION_NAME=lltm_cpp -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\TH -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\THC -IE:\Miniconda\envs\pointnetpy\Include -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c D:\Github\extension-cpp\cpp\lltm.cpp /Folltm.o
FAILED: lltm.o
cl /showIncludes -DTORCH_EXTENSION_NAME=lltm_cpp -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\TH -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\THC -IE:\Miniconda\envs\pointnetpy\Include -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c D:\Github\extension-cpp\cpp\lltm.cpp /Folltm.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
cl : Command line warning D9002 : ignoring unknown option '-fPIC'
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include\torch/cuda.h(5): fatal error C1083: Cannot open include file: 'cstddef': No such file or directory
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 949, in _build_extension_module
check=True)
File "E:\Miniconda\envs\pointnetpy\lib\subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "jit.py", line 2, in
lltm_cpp = load(name="lltm_cpp", sources=["lltm.cpp"], verbose=True)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 644, in load
is_python_module)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 813, in _jit_compile
with_cuda=with_cuda)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 866, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 962, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'lltm_cpp'
I am following this tutorial to create a C++ extension for Pytorch. My C++ code is giving following error :
test.cpp:3:10: fatal error: torch/torch.h: No such file or directory
#include <torch/torch.h>
How to get torch.h header file ? Is there some pytorch-dev
version ?
running install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
gcc -pthread -B /home/guhongyang/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/THC -I:/usr/local/cuda-9.0/include -I/home/guhongyang/anaconda3/include/python3.7m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.7/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
:/usr/local/cuda-9.0/bin/nvcc -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/THC -I:/usr/local/cuda-9.0/include -I/home/guhongyang/anaconda3/include/python3.7m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++11
unable to execute ':/usr/local/cuda-9.0/bin/nvcc': No such file or directory
error: command ':/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1
python benchmark.py cpp
Segmentation fault
Feature request:
Add an example of PyTorch's canonical error checking in the CUDA example
I think there is an typing error in an example code in the tutorials Link on Pytorch website
import math
import torch
# Our module!
import lltm
class LLTMFunction(torch.nn.Function): # <-------------- Here
@staticmethod
def forward(ctx, input, weights, bias, old_h, old_cell):
outputs = lltm.forward(input, weights, bias, old_h, old_cell)
new_h, new_cell = outputs[:2]
variables = outputs[1:] + [weights, old_cell]
ctx.save_for_backward(*variables)
return new_h, new_cell
torch.nn
does not have module Function
, It should be torch.autograd.Function
for cpp extension, i got the following error:
lltm.obj : error LNK2001: 无法解析的外部符号 __imp_THPVariableClass
lltm.obj : error LNK2001: 无法解析的外部符号 "struct _object * __cdecl THPVariable_Wrap(struct torch::autograd::Variable)" (?THPVariable_Wrap@@YAPEAU_object@@UVariab
le@autograd@torch@@@z)
for cuda extension, i got the following error:
error: Don't know how to compile lltm_cuda_kernel.cu to build\temp.win-amd64-3.6\Release\lltm_cuda_kernel.obj
and i step into build_extension(), and found that warp_compile did not called
-------------update---------------------
issue information:
OS: windows 10 enterprise build 14393
PyTorch version: 0.4.0a0+59d1d17
How you installed PyTorch (conda, pip, source): source
Python version: anaconda 5.1 , python 3.6.4
CUDA/cuDNN version: 9.1.85/7.1
GPU models and configuration: GTX 1050Ti
GCC version (if compiling from source): visual studio 2017 toolset 14.11
additional information:
i modified the setup.py in cpp directory as follows:
setup(
name='lltm_cpp',
ext_modules=[
CppExtension(name='lltm_cpp',
sources = ['lltm.cpp'],
library_dirs=[r'C:\ProgramData\Anaconda3\Lib\site-packages\torch\lib'],
libraries=['ATen', 'shm']),
],
cmdclass={
'build_ext': BuildExtension
})
Hello,
this is more a thread discussion than a real issue, but I've been working on the cuda kernel readability.
And pytorch actually provides very nice way of presenting tensor data for kernels as if it was still a multidimensional vector.
see here for a working prototype : https://github.com/ClementPinard/extension-cpp/blob/deviceTensorExperiments/cuda/lltm_cuda_kernel.cu
Essentially, I designed a simple convertor from at::Tensor
to THCDeviceTensor<scalar_t, 2, size_t, RestrictPtrTraits>
The conversion is not very pretty, but it allows us to write more readable memory accesses in kernels while still doing eventually the exact same thing (even the __restricted__
keyword is kept)
Let's look at the current code for forward :
template <typename scalar_t>
__global__ void lltm_cuda_forward_kernel(
const scalar_t* __restrict__ gates,
const scalar_t* __restrict__ old_cell,
scalar_t* __restrict__ new_h,
scalar_t* __restrict__ new_cell,
scalar_t* __restrict__ input_gate,
scalar_t* __restrict__ output_gate,
scalar_t* __restrict__ candidate_cell,
size_t state_size) {
const int column = blockIdx.x * blockDim.x + threadIdx.x;
const int index = blockIdx.y * state_size + column;
const int gates_row = blockIdx.y * (state_size * 3);
if (column < state_size) {
input_gate[index] = sigmoid(gates[gates_row + column]);
output_gate[index] = sigmoid(gates[gates_row + state_size + column]);
candidate_cell[index] = elu(gates[gates_row + 2 * state_size + column]);
new_cell[index] =
old_cell[index] + candidate_cell[index] * input_gate[index];
new_h[index] = tanh(new_cell[index]) * output_gate[index];
}
}
the column
and index
are kinda hard to figure out. It actually use the fact that blockDim.y
is batch size and thus BlockIdx.y
the batch index. column
is then the index in the state and index
is batch_idx * batch_stride + column
while gates_row
is the first index of the gates in that particular element of the batch, because its batch stride is thrice as much.
Now my code proposition :
template <typename scalar_t>
__global__ void lltm_cuda_forward_kernel(
const dTensor2R gates,
const dTensor2R old_cell,
dTensor2R new_h,
dTensor2R new_cell,
dTensor2R input_gate,
dTensor2R output_gate,
dTensor2R candidate_cell,
size_t state_size) {
const int n = blockIdx.y; //batch index
CUDA_KERNEL_LOOP(c, state_size) {
input_gate[n][c] = sigmoid((scalar_t) gates[n][c]);
output_gate[n][c] = sigmoid((scalar_t) gates[n][c + state_size]);
candidate_cell[n][c] = elu((scalar_t) gates[n][c + 2 * state_size]);
new_cell[n][c] =
old_cell[n][c] + candidate_cell[n][c] * input_gate[n][c];
new_h[n][c] = tanh((scalar_t) new_cell[n][c]) * output_gate[n][c];
}
}
I use dTensor2R
that defined as THCDeviceTensor<scalar_t, 2, size_t, RestrictPtrTraits>
in a macro above.
Besides using the strided loop CUDA_KERNEL_LOOP
(just for the sake of good practices), we now only need to compute n
which is explicetely the batch index and c
which is the column
from above.
every relevant value can now be accessed with tensor[n][c + shift]
making it very similar to an actual 2D array.
I tested my code on master (from a few days) and it works for both check.py
and grad_check.py
. It does not need pytorch source code, only the compiled binaries and the headers.
Is this proposition legit ? I feel like it could be good way of letting people write cuda with more complicated ND-tensors (like 4D tensors for regular feature maps) without all the complex indexing stuff. And if so, that could be a good reason for letting a more use friendly method for at::Tensor
to deviceTHCTensor
conversion being written.
Hello.
I wrote my test codes as follow:
test.py
a = torch.zeros((1), dtype=torch.int)
a = a.cuda(0)
x = test_cuda.func(a)
print(x)
cuda.cpp
#include <torch/torch.h>
void func_wrapper(int* a);
at::Tensor func(at::Tensor a)
{
func_wrapper(a.data<int>());
return a;
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("func", &func, "func");
}
cuda_kernel.cu
#include <ATen/ATen.h>
#include <cuda.h>
#include <cuda_runtime.h>
__global__ void func_kernel(int* __restrict__ a)
{
a[0] = 4;
}
void func_wrapper(int* a)
{
func_kernel<<<1,1>>>(a);
}
When I used a = a.cuda(0)
in test.py. I got expected result:
tensor([4], device='cuda:0', dtype=torch.int32)
But when I used a = a.cuda(3)
(I have multiple GPUs). I got
tensor([0], device='cuda:3', dtype=torch.int32)
The result tensor was tensor([0]). Why?
Thanks a lot.
OS: Ubuntu 14
PyTorch version: torch-nightly 1.0.0.dev20190219
How you installed PyTorch (conda, pip, source):
I compiled and ran the code in a dockr container. The docker image was ufoym/deepo:pytorch-py36-cu90
GPU models and configuration: 4 GeForce GTX TITANs
Looks like the link in the README to
http://pytorch.org/docs/master/notes/cpp-extensions.html
Is broken.
hi, @goldsborough , Is there a good way to debug .cu
code ?
From your great tutorial, I write a extension of ROI Align&ROI Pool. The cpp code is pretty easy to debug (just as the pure c++ code), however, i find it "troublesome" to debug of the cuda
code (run python setup.py install
again and again), and nvcc will raise error about cannot find ATen library.
Would you have any advice for debug cuda code?
Thank you !
(I find there are little discussions about c++ extension in Forums, so i pull an issue here. )
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:206:error: expansion pattern ‘pybind11::detail::negation<std::is_same<pybind11::detail::bools<pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ..., true>, pybind11::detail::bools<true, pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ...> > >::value’ contains no argument packs
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:215:error: template argument 1 is invalid
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:392:error: expansion pattern ‘pybind11::detail::negation<std::is_same<pybind11::detail::bools<pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ..., true>, pybind11::detail::bools<true, pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ...> > >::value’ contains no argument packs
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:395:error: template argument 2 is invalid
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:
I can't compile the cuda version. When torch/extension.h is included in the .cu file, this error is reported.
How to solve this problem.
Here is the gist of my code, along with the error in the comments.
https://gist.github.com/kris-singh/ce5fbdb8aa242ed3357d27c3e8fdea5f
I tried to debug the issue using lldb. I found that for the *this, src and index tensors all variables numel are correct. But when trying to get their sizes() it shows 1. I don't really understand why this is happening. Also in the error message should tell which tensor was half constructed.
When submitting a bug report, please include the following information (where relevant):
Also, an unrelated not do you know how I could get code completion and definition look. Right now I have to ag search the aten directory for finding what I want to use.
Thank you, @goldsborough , for this great tutorial, but could you tell me how to call functions from CUDA libraries? I want to use functions from <cusparse.h> and they all need cusparseHandle_t handle argument to be passed. But I don't know how to get it.
Hi,Peter
CMD: python3 jit2.py
no result,help, pls ...
Loading extension module lltm_cuda...
Traceback (most recent call last):
File "jit2.py", line 24, in
verbose=True
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 645, i
n load
is_python_module)
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 825, i
n _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 965, i
n _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "E:\Python36\lib\imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'lltm_cuda'
When submitting a bug report, please include the following information (where relevant):
In addition, including the following information will also be very helpful for us to diagnose the problem:
cd cuda
python3 setup.py develop
[2/2] /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/site-packages/torch/include -I/usr/local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/site-packages/torch/include/TH -I/usr/local/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c -c /Users/tomheaven/Downloads/extension-cpp-master/cuda/lltm_cuda_kernel.cu -o /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=sm_70 -std=c++14
FAILED: /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o
/usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/site-packages/torch/include -I/usr/local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/site-packages/torch/include/TH -I/usr/local/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c -c /Users/tomheaven/Downloads/extension-cpp-master/cuda/lltm_cuda_kernel.cu -o /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=sm_70 -std=c++14
/usr/local/lib/python3.7/site-packages/torch/include/c10/util/variant.h(2241): error: parameter pack "Ts" was referenced but not expanded
The error seems to be from pytorch's header file variant.h(2241).
template <typename... Ts>
class variant {
static_assert(0 < sizeof...(Ts),
"variant must consist of at least one alternative.");
static_assert(lib::all<!std::is_array<Ts>::value...>::value,
"variant can not have an array type as an alternative.");
static_assert(lib::all<!std::is_reference<Ts>::value...>::value,
"variant can not have a reference type as an alternative.");
static_assert(lib::all<!std::is_void<Ts>::value...>::value,
"variant can not have a void type as an alternative.");
public:
template <
typename Front = lib::type_pack_element_t<0, Ts...>, // Line 2241. The error is in this line.
lib::enable_if_t<std::is_default_constructible<Front>::value, int> = 0>
inline constexpr variant() noexcept(
std::is_nothrow_default_constructible<Front>::value)
: impl_(in_place_index_t<0>{}) {}
Learning the internals of pytorch. I need to test some c++ functions that I build with aten.
I have followed this example https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/CMakeLists.txt.
I don't want to write the CMakeList file for every source file that i write. Is there a way to link against the .dylib files using -l flag. I tried g++ --std=c++11 -l/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.dylib -I/anaconda3/envs/pytorch1.0/lib/python3.6/sitepackages/torch/lib/include example-app.cpp -o example but this fails.
Could you let me know a way to do this? Or why is it preferred to make CmakeFiles for small examples
cc: @goldsborough
Hello,
the compilation of the setup.py in cpp
is successful but, for /cuda/setup.py
I get the following compile error. Therefore I would like to ask you, if you have an idea what my mistake could be.
Best regards
System:
Error log:
rrunning install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
gcc -pthread -B /pizady/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from lltm_cuda.cpp:1:0:
/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]
#warning \
^~~~~~~
/usr/local/cuda/bin/nvcc -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
lltm_cuda_kernel.cu(54): error: calling a __host__ function("std::fmax<double, float> ") from a __global__ function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel<float> ") is not allowed
lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code
2 errors detected in the compilation of "/tmp/tmpxft_00000f0c_00000000-6_lltm_cuda_kernel.cpp1.ii".
Extension of CUDA interface:
Are there any requirements for vs, Python and CUDA versions?
My vs version is vs2015, Python version is Python 3.5.4, and CUDA version is 10.0.
I’m trying to build a cpp extension for point cloud iterative closest point using the icp function in pcl-1.7 http://pointclouds.org/documentation/tutorials/iterative_closest_point.php.
The data transforming from at::tensor to pcl::Pointcloud is fine. However, as soon as I declare a new icp object, there will be a segmentation fault.
I also tried to add more arguments to the CppExtension as https://github.com/strawlab/python-pcl/blob/master/setup.py. But it doesn’t help.
To repeat the bug, you can clone the related files from https://github.com/onlytailei/icp_extension.
There should be pcl and eigen in the system
sudo apt-get install libpcl-all
sudo apt-get install libeigen3-dev
Then build the extension through:
python setup install.py
Comment/Uncomment this line in icp_op.cpp.
pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp;
And rebuild the extension, you will see the difference.
python icp_test.py
The problem only occurs on pytorch master, because it's backprop engine is less compliant :
when running benchmark.py cpp
(or cuda) :
Traceback (most recent call last):
File "benchmark.py", line 43, in <module>
(new_h.sum() + new_C.sum()).backward()
File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [384] but got [1, 384]
This is due to modules bias parameter to be of size 3 * state_size
while the backward outputs a tensor of size 1 x 3 * state_size
. The problem is still here for torch 0.4.0, but the backprop engine doesn't complaint as the number of elements is the same.
So the solution could be to remove the keepdim=True
in the d_bias computing e.g. here (but it's the same for python baseline, cpp and cuda)
But then you get the opposite error message when running check.py
and grad_check.py
:
Traceback (most recent call last):
File "check.py", line 107, in <module>
check_backward(variables, options.cuda, options.verbose)
File "check.py", line 53, in check_backward
(baseline_values[0] + baseline_values[1]).sum().backward()
File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [1, 15] but got [15]
This is because now the bias given to the function is of size 1 x 15
!
The solution is pretty simple, but needs to decide on what to do :
bias
parameter in every nn module dimension 1 x ...
check.py
and grad_check.py
and remove the keepdim=True
arguments when computing d_bias
sums.Hi,
do CUDA extensions currently support MultiGPUs? Putting all tensors onto device torch.device('cuda:1')
in benchmark.py
either yields
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /.../pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70
when trying to print out the resulting tensors (e.g. print(new_h)
) or prints out only zeros (e.g. print(new_h.cpu().tolist())
).
I cloned the repository, and the CPU version compiles, but I get the following error when running python setup.py install
in the cuda folder.
running install
running bdist_egg
running egg_info
creating lltm_cuda.egg-info
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing lltm_cuda.egg-info/PKG-INFO
writing top-level names to lltm_cuda.egg-info/top_level.txt
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.5
gcc -pthread -B /home/mantas/anaconda3/envs/pytorch04/compiler_compat -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.5/lltm_cuda.o -DTORCH_EXTENSION_NAME=lltm_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-9.0/bin/nvcc -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.5/lltm_cuda_kernel.o -DTORCH_EXTENSION_NAME=lltm_cuda --compiler-options '-fPIC' -std=c++11
lltm_cuda_kernel.cu(54): error: calling a host function("std::fmax<double, float> ") from a global function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel ") is not allowedlltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code
2 errors detected in the compilation of "/tmp/tmpxft_00002819_00000000-6_lltm_cuda_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1
I'm using PyTorch 0.4.0 installed via conda a few weeks ago, Python 3.5, CUDA 9.0, cuDNN 7.1.4, and GCC 6.4.0.
Dear all,
I get an error when compiling torch-scatter from source using .
Here is the link where my issue is described:
pyg-team/pytorch_geometric#307 (comment)
I followed the tutorial to create my cpp_extension. But the module failed to do half-precision computation.
It gives:
RuntimeError: "op" not implemented for 'Half' (operator() at /op_cuda_kernel.cu:146)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fa561676687 in /miniconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
When submitting a bug report, please include the following information (where relevant):
Setup:
Latest OS X and pytorch built from master (no GPU).
First error comes from torch.cuda
not being present: https://github.com/pytorch/pytorch/blob/abd8501020d16e9aa12fa60dfd38ed70b8d7b71e/torch/utils/cpp_extension.py#L45. I manually set it to None.
The next one is related to flags, if I try: python setup.py install
I get the following error:
fatal error: 'atomic' file not found
#include <atomic>
^~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1
That can be fixed by passing: CFLAGS='-stdlib=libc++'
.
Next problem comes when I try to import:
ImportError: dlopen(/Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so, 2): Symbol not found: _THPVariableClass
Referenced from: /Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so
Expected in: flat namespace
in /Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so
Hi, been tweaking the repo a bit, and wanted to try Half Tensor compatibility
So in the cuda, instead of AT_DISPATCH_FLOATING_TYPES
here and here
I just changed the dispatch function to AT_DISPATCH_FLOATING_TYPES_AND_HALF
, naively hoping that everything would work without changing anything else.
Unfortunately, I got this error (while only dispatching floating types work) :
lltm_cuda_kernel.cu(123): error: identifier "Half" is undefined
lltm_cuda_kernel.cu(157): error: identifier "Half" is undefined
Is there something i forgot to do ? apparently the Half
is not recognized by the compiler like it is for float
or double
so maybe I need to include a header ? I tried #include <cuda_fp16.h>
, #include <ATen/Half.h>
and #include <ATen/Type.h>
but it didn't work.
Thanks !
Clément
In teneral: the cuda version in cuda/ has import error when compiled and installed by setup.py. The JIT version under cuda/ works.
After cloning the source code, go to the cuda/ directory run
python setup.py build_ext && python setup.py install
compilation finishes with warnings:
====== Compilation outputs. ======
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/usr/include/python3.6m -I/home/yaoyu/p3pt/include/python3.6m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/local/cuda-10.1/bin/nvcc -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/usr/include/python3.6m -I/home/yaoyu/p3pt/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
lltm_cuda_kernel.cu: In lambda function:
lltm_cuda_kernel.cu:119:98: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(gates.type(), "lltm_forward_cuda", ([&] {
^
/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:78:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~~~~~~~~~
lltm_cuda_kernel.cu: In lambda function:
lltm_cuda_kernel.cu:152:94: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(X.type(), "lltm_forward_cuda", ([&] {
^
/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:78:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~~~~~~~~~
creating build/lib.linux-x86_64-3.6
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/lltm_cuda.o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -L/usr/local/cuda-10.1/lib64 -lcudart -o build/lib.linux-x86_64-3.6/lltm_cuda.cpython-36m-x86_64-linux-gnu.so
====== End of compilation outputs. ======
When importing lltm_cuda the following error happens
====== Import error. ======
import lltm_cuda
ImportError: /home/yaoyu/p3pt/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E
====== End of import error. ======
Try the jit.py in the cuda/ folder, the compilation outputs are as follows:
====== Compilation outputs from cuda/jit.py. ======
Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/lltm_cuda/build.ninja...
Building extension module lltm_cuda...
[1/3] c++ -MMD -MF lltm_cuda.o.d -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yaoyu/p3pt/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/yaoyu/Projects/DeepLearningModels/extension-cpp/cuda/lltm_cuda.cpp -o lltm_cuda.o
[2/3] /usr/local/cuda-10.1/bin/nvcc -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -isystem /home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/yaoyu/p3pt/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/yaoyu/Projects/DeepLearningModels/extension-cpp/cuda/lltm_cuda_kernel.cu -o lltm_cuda_kernel.cuda.o
[3/3] c++ lltm_cuda.o lltm_cuda_kernel.cuda.o -shared -L/usr/local/cuda-10.1/lib64 -lcudart -o lltm_cuda.so
====== End of compilation outputs from cuda/jit.py. ======
Then cuda/jit.py will import lltm_cuda automatically, no error happens.
I built an extension basing on this tutorial and it used to work. I was then doing some refactoring and fixes (in cuda/cpp code) and afterwards it started failing at runtime:
/home/jatentaki/anaconda3/lib/python3.6/site-packages/sort2_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cpp.cpython-36m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass
(both for CUDA and cpp versions). Then I tried if the original example still worked, and to my surprise, no longer.
Timeline:
I believe the error just means I am not linking against some static library, but I don't see when and how I could have introduced that change.
Is there any way to automatically set the backends (cpu or gpu)?
To merge the two Function
into one and it will choose the proper backend according to what device we are using.
Compiling the cpp version is ok, but the cuda python setup.py install
fails.
The output log is quite long so I have attached it, and I put the end of the log below.
Any ideas?
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/pytypes.h:923:28: required from ‘pybind11::str pybind11::str::format(Args&& ...) const [with Args = {pybind11::object&, const pybind11::handle&}]’
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:1401:51: required from here
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2108:44: error: no matching function for call to ‘collect_arguments(pybind11::object&, const pybind11::handle&)’
return detail::collect_arguments<policy>(std::forward<Args>(args)...).call(derived().ptr());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2087:1: note: candidate: template<pybind11::return_value_policy policy, class ... Args, class> pybind11::detail::simple_collector<policy> pybind11::detail::collect_arguments(Args&& ...)
simple_collector<policy> collect_arguments(Args &&...args) {
^~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2087:1: note: template argument deduction/substitution failed:
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2094:1: note: candidate: template<pybind11::return_value_policy policy, class ... Args, class> pybind11::detail::unpacking_collector<policy> pybind11::detail::collect_arguments(Args&& ...)
unpacking_collector<policy> collect_arguments(Args &&...args) {
^~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2094:1: note: template argument deduction/substitution failed:
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
Following is info for my environment:
After cloning the git repo, I run setup.py
$ cd $extension-cpp/cuda
$ python setup.py install
Here is what I got:
running install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/lltm_cuda.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for lltm_cuda.cpython-36m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/lltm_cuda.py to lltm_cuda.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.lltm_cuda.cpython-36: module references __file__
creating 'dist/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
removing '/home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg' (and everything under it)
creating /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
Extracting lltm_cuda-0.0.0-py3.6-linux-x86_64.egg to /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages
lltm-cuda 0.0.0 is already the active version in easy-install.pth
Installed /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
Processing dependencies for lltm-cuda==0.0.0
Finished processing dependencies for lltm-cuda==0.0.0
Then I run
$ cd ..
$ python benchmark.py cuda
Then error occurred,
Traceback (most recent call last):
File "benchmark.py", line 43, in <module>
new_h, new_C = rnn(X, (h, C))
File "/home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/xx/MyGit/cuda_test/extension-cpp/cuda/lltm.py", line 45, in forward
return LLTMFunction.apply(input, self.weights, self.bias, *state)
File "/home/xx/MyGit/cuda_test/extension-cpp/cuda/lltm.py", line 14, in forward
outputs = lltm_cuda.forward(input, weights, bias, old_h, old_cell)
RuntimeError: expected 3 dims but tensor has 2 (packed_accessor at /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/include/ATen/core/Tensor.h:223)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f0568445cf5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: at::PackedTensorAccessor<float, 3ul, at::RestrictPtrTraits, unsigned long> at::Tensor::packed_accessor<float, 3ul, at::RestrictPtrTraits, unsigned long>() const & + 0xd3 (0x7f0552854c59 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x2b49d (0x7f055284c49d in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x2b6e5 (0x7f055284c6e5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: lltm_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) + 0x2f0 (0x7f055284cad5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: lltm_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) + 0x1c4 (0x7f055283c454 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x23258 (0x7f0552844258 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x27c35 (0x7f0552848c35 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #14: THPFunction_apply(_object*, _object*) + 0x579 (0x7f058f9361d9 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #38: __libc_start_main + 0xe7 (0x7f05a17abb97 in /lib/x86_64-linux-gnu/libc.so.6)
And py and cpp work fine.
Hello, I want to use PyTorch Geometric but "pip install --upgrade torch-scatter" failed, then I want to try to build extension-cpp, but still failed, the following is the information of my system and log
System
Log
running install
running bdist_egg
running egg_info
creating lltm_cuda.egg-info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.7-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.macosx-10.7-x86_64-3.6
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kiwee/anaconda3/include -arch x86_64 -I/Users/kiwee/anaconda3/include -arch x86_64 -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/TH -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/Users/kiwee/anaconda3/include/python3.6m -c lltm_cuda.cpp -o build/temp.macosx-10.7-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -std=c++11
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:3:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:48:20: error: explicit specialization of
non-template struct 'hash'
template <> struct hashc10::DeviceType {
^ ~~~~~~~~~~~~~~~~~
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:21: error: expected '(' for
function-style cast or type construction
return std::hash()(static_cast(k));
~~~~~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:25: error: expected '(' for
function-style cast or type construction
return std::hash()(static_cast(k));
~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:27: error: expected expression
return std::hash()(static_cast(k));
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/Exception.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/StringUtil.h:5:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:52:12: error: no member named 'stod' in
namespace 'std'
using std::stod;
~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:53:12: error: no member named 'stoi' in
namespace 'std'; did you mean 'atoi'?
using std::stoi;
~~~~~^~~~
atoi
/usr/include/c++/4.2.1/cstdlib:113:11: note: 'atoi' declared here
using ::atoi;
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/Exception.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/StringUtil.h:5:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:54:12: error: no member named 'stoull'
in namespace 'std'
using std::stoull;
~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:55:12: error: no member named
'to_string' in namespace 'std'
using std::to_string;
~~~~~^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:103:8: error: explicit specialization of
non-template struct 'hash'
struct hashc10::Device {
^ ~~~~~~~~~~~~~
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:103:8: error: redefinition of 'hash'
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:48:20: note: previous definition is here
template <> struct hashc10::DeviceType {
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:7:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:42:8: error: no template named
'unique_ptr' in namespace 'std'
std::unique_ptr<void, DeleterFnPtr> ctx_;/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:66:8: error: no template named 'unique_ptr' in namespace 'std' std::unique_ptr<void, DeleterFnPtr>&& move_context() { ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:67:17: error: no member named 'move' in namespace 'std' return std::move(ctx_); ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:71:17: error: no member named 'unique_ptr' in namespace 'std' ctx_ = std::unique_ptr<void, DeleterFnPtr>(ctx_.release(), new_deleter); ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:71:32: error: expected '(' for function-style cast or type construction ctx_ = std::unique_ptr<void, DeleterFnPtr>(ctx_.release(), new_deleter); ~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:107:54: error: no type named 'nullptr_t' in namespace 'std' inline bool operator==(const UniqueVoidPtr& sp, std::nullptr_t) noexcept { ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:110:29: error: no type named 'nullptr_t' in namespace 'std' inline bool operator==(std::nullptr_t, const UniqueVoidPtr& sp) noexcept { ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:113:54: error: no type named 'nullptr_t' in namespace 'std' inline bool operator!=(const UniqueVoidPtr& sp, std::nullptr_t) noexcept { ~~~~~^ /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:116:29: error: no type named 'nullptr_t' in namespace 'std' inline bool operator!=(std::nullptr_t, const UniqueVoidPtr& sp) noexcept { ~~~~~^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. error: command 'gcc' failed with exit status 1
A suggested by the title, should we use the torch::
instead of at::
in the cpp and cuda modules ?
It's suggested here, so maybe this could be updated ?
I can do a PR, just checking if there's a reason not to do it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.