pytorch / extension-cpp Goto Github PK

C++ extensions in PyTorch

Python 56.81% C++ 20.63% Cuda 22.56%

extension-cpp's Introduction

C++/CUDA Extensions in PyTorch

An example of writing a C++/CUDA extension for PyTorch. See here for the accompanying tutorial. This repo demonstrates how to write an example extension_cpp.ops.mymuladd custom op that has both custom CPU and CUDA kernels.

The examples in this repo work with PyTorch 2.4+.

To build:

pip install .

To test:

python test/test_extension.py

To benchmark Python vs. C++ vs. CUDA:

python test/benchmark.py

Authors

Peter Goldsborough

extension-cpp's People

Contributors

Stargazers

Watchers

Forkers

hfxunlp shubhampachori12110095 manila95 andrei-pokrovsky back2yes bermanmaxim robot-ai-machinelearning daijucug princep unicornhope mortont derthorsten diligentpanda wutianyirosun zou3519 junweima vfdev-5 dhpollack auserj zunzhumu chomolungma dongfangduoshou123 josephchenhub binbinmeng lishuo-bit yuhonghong66 jimmysue shuuwook thompsjj dengzeshuai zhulei2016 digitalhero92127 yazici poodarchu unixnme batermj gzoumpourlis qwerbbbb mb3rg pandinosaurus ngunsu whitemike889 truthiswill teja10 haoyz xiagenfeng zitianwang yueyihua ccj5351 tor4z joesrain alance123 tejashah94 rdpli danysan2000 zuru y78h11b09 myuluo fbqwings maniin thoughtiveio oranluzon rednickle meder411 minygd soplia collector-m hetong007 janaldochen shnhrtkyk ustcfdm zhouhaocomeon1 zivzone duxiu727 nic-ma nvlawachan sahilrox terrorizer1980 chd1998 oooxxx996 shivak wazizian smorodov clementpinard zrqiao 00mjk thaint2901 keondopark linuxpham shiceliu org-mars jamshittt theassyrian qasimak191 luchangli03 ihg1992 duzhiqiang2019 hatemselim94 isabella232 aust-hansen

extension-cpp's Issues

error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

OS: ubuntu 16.04.10
PyTorch version: 1.0.1
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6.4
CUDA version: 9.0.176
GCC version (if compiling from source): 5.4.0

Got 'expected 3 dims but tensor has 2 ' when run cuda benchmark

Following is info for my environment:

OS: Ubuntu 18
PyTorch version: 1.0.1
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6
CUDA/cuDNN version: 10.0
GPU models and configuration: GeForce RTX 2080
GCC version (if compiling from source): 7.4.0

After cloning the git repo, I run setup.py

$ cd $extension-cpp/cuda
$ python setup.py install

Here is what I got:

running install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/lltm_cuda.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for lltm_cuda.cpython-36m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/lltm_cuda.py to lltm_cuda.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying lltm_cuda.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.lltm_cuda.cpython-36: module references __file__
creating 'dist/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
removing '/home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg' (and everything under it)
creating /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
Extracting lltm_cuda-0.0.0-py3.6-linux-x86_64.egg to /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages
lltm-cuda 0.0.0 is already the active version in easy-install.pth

Installed /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg
Processing dependencies for lltm-cuda==0.0.0
Finished processing dependencies for lltm-cuda==0.0.0

Then I run

$ cd ..
$ python benchmark.py cuda

Then error occurred,

Traceback (most recent call last):
  File "benchmark.py", line 43, in <module>
    new_h, new_C = rnn(X, (h, C))
  File "/home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xx/MyGit/cuda_test/extension-cpp/cuda/lltm.py", line 45, in forward
    return LLTMFunction.apply(input, self.weights, self.bias, *state)
  File "/home/xx/MyGit/cuda_test/extension-cpp/cuda/lltm.py", line 14, in forward
    outputs = lltm_cuda.forward(input, weights, bias, old_h, old_cell)
RuntimeError: expected 3 dims but tensor has 2 (packed_accessor at /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/include/ATen/core/Tensor.h:223)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f0568445cf5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: at::PackedTensorAccessor<float, 3ul, at::RestrictPtrTraits, unsigned long> at::Tensor::packed_accessor<float, 3ul, at::RestrictPtrTraits, unsigned long>() const & + 0xd3 (0x7f0552854c59 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #2: <unknown function> + 0x2b49d (0x7f055284c49d in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x2b6e5 (0x7f055284c6e5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: lltm_cuda_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) + 0x2f0 (0x7f055284cad5 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: lltm_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor) + 0x1c4 (0x7f055283c454 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x23258 (0x7f0552844258 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x27c35 (0x7f0552848c35 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/lltm-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so)
<omitting python frames>
frame #14: THPFunction_apply(_object*, _object*) + 0x579 (0x7f058f9361d9 in /home/xx/anaconda3/envs/mmdet/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #38: __libc_start_main + 0xe7 (0x7f05a17abb97 in /lib/x86_64-linux-gnu/libc.so.6)

And py and cpp work fine.

how to debug cuda code

hi, @goldsborough , Is there a good way to debug .cu code ?
From your great tutorial, I write a extension of ROI Align&ROI Pool. The cpp code is pretty easy to debug (just as the pure c++ code), however, i find it "troublesome" to debug of the cuda code (run python setup.py install again and again), and nvcc will raise error about cannot find ATen library.

Would you have any advice for debug cuda code?

Thank you !
(I find there are little discussions about c++ extension in Forums, so i pull an issue here. )

segmentation fault for pcl icp implementation in pytorch cpp extension

OS: ubuntu 16.04
PyTorch version: 0.4.0
How you installed PyTorch (conda, pip, source): conda
Python version: anaconda python 3.6.0
CUDA/cuDNN version: 9.0
GPU models and configuration: GeForce GTX 1080
GCC version (if compiling from source): 5.4.0

I’m trying to build a cpp extension for point cloud iterative closest point using the icp function in pcl-1.7 http://pointclouds.org/documentation/tutorials/iterative_closest_point.php.

The data transforming from at::tensor to pcl::Pointcloud is fine. However, as soon as I declare a new icp object, there will be a segmentation fault.

I also tried to add more arguments to the CppExtension as https://github.com/strawlab/python-pcl/blob/master/setup.py. But it doesn’t help.

To repeat the bug, you can clone the related files from https://github.com/onlytailei/icp_extension.
There should be pcl and eigen in the system

sudo apt-get install libpcl-all
sudo apt-get install libeigen3-dev

Then build the extension through:

python setup install.py

Comment/Uncomment this line in icp_op.cpp.

pcl::IterativeClosestPoint<pcl::PointXYZ, pcl::PointXYZ> icp;

And rebuild the extension, you will see the difference.

python icp_test.py

[feature request] use torch namespace instead of at

A suggested by the title, should we use the torch:: instead of at:: in the cpp and cuda modules ?

It's suggested here, so maybe this could be updated ?

I can do a PR, just checking if there's a reason not to do it

unable to execute ':/usr/local/cuda-9.0/bin/nvcc': No such file or directory error: command ':/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1

running install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
gcc -pthread -B /home/guhongyang/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/THC -I:/usr/local/cuda-9.0/include -I/home/guhongyang/anaconda3/include/python3.7m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.7/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
:/usr/local/cuda-9.0/bin/nvcc -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/TH -I/home/guhongyang/anaconda3/lib/python3.7/site-packages/torch/include/THC -I:/usr/local/cuda-9.0/include -I/home/guhongyang/anaconda3/include/python3.7m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++11
unable to execute ':/usr/local/cuda-9.0/bin/nvcc': No such file or directory
error: command ':/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1

Change AT_ASSERT to AT_CHECK

This is an issue regarding pytorch master and not 0.4.0 , so I don't put it directly in a PR, but might be good to know this

This line (and the next) make the compilation fail when using master build (and nightly builds I believe) because the function definition has changed.

Now there's apparently a lot of refactoring going on in pytorch right now so it may change again in the future but the correct way to call this line is now

#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, " must be contiguous")

see this commit for explanation.

Undefined symbol issue

Hello, I implement a custom cpp file, successfully compile it but when trying to import it via

import torch
import grid_sampler_cuda

, I encounter the following error:

ImportError: /home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-pa
ckages/grid_sampler_cuda-0.0.0-py3.7-linux-x86_64.egg/grid_sampler_cuda.cpython-
37m-x86_64-linux-gnu.so: undefined symbol: _Z23my_grid_sampler_2d_cudaRKN2at6Ten
sorES2_b

My torch._C._GLIBCXX_USE_CXX11_ABI is False and my complier's output:

gcc -pthread -B /home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/compiler_compat -W
l,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -
fPIC -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages
/torch/include -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/sit
e-packages/torch/include/torch/csrc/api/include -I/home/shuqin/anaconda3/envs/py
torch_1.4_py3.7/lib/python3.7/site-packages/torch/include/TH -I/home/shuqin/anac
onda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages/torch/include/THC -I/us
r/local/cuda-10.0/include -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/includ
e/python3.7m -c grid_sampler_cuda.cpp -o build/temp.linux-x86_64-3.7/grid_sample
r_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=grid_sampler_cud
a -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 
cc1plus: warning: command line option '-Wstrict-prototypes'  is valid for C/ObjC
but not for C++
/usr/local/cuda-10.0/bin/nvcc -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/li
b/python3.7/site-packages/torch/include -I/home/shuqin/anaconda3/envs/pytorch_1.
4_py3.7/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home
/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-packages/torch/inclu
de/TH -I/home/shuqin/anaconda3/envs/pytorch_1.4_py3.7/lib/python3.7/site-package
s/torch/include/THC -I/usr/local/cuda-10.0/include -I/home/shuqin/anaconda3/envs
/pytorch_1.4_py3.7/include/python3.7m -c grid_sampler_cuda_kernel.cu -o build/te
mp.linux-x86_64-3.7/grid_sampler_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__
CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexp
r --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_N
AME=grid_sampler_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_75,code=s
m_75 -std=c++11

I browse some relative issues and many of them suggest adding -D_GLIBCXX_USE_CXX11_ABI=0 to compiler, however this is already satisfied in my case. My pytorch version is 1.4.0, installed via conda install pytorch torchvision cudatoolkit=10.0 -c pytorch. After installing pytorch, I might have upgrade my gcc version to 7.5.0, and install horovod which also install a gxx_linux-64 via conda install gxx_linux-64`.

Could somenbody please help me on this?

CUDA_HOME environment variable is not set

I work on ubuntu16.04, cuda9.0 and Pytorch1.0.
When I run your example code cuda/setup.py:

Traceback (most recent call last):
  File "setup.py", line 9, in <module>
    'lltm_cuda_kernel.cu',
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 476, in CUDAExtension
    library_dirs += library_paths(cuda=True)
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 549, in library_paths
    paths.append(_join_cuda_home(lib_dir))
  File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 1121, in _join_cuda_home
    raise EnvironmentError('CUDA_HOME environment variable is not set. '
EnvironmentError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

However, I am sure cuda9.0 in my computer is installed correctly.
How to fix this problem?

python grad_check.py cuda Error google colab

I am running it on google colab and python grad_check.py cuda is not passing successfully, others (py, cpp) are passing with no issues.

Extension of CUDA interface

Extension of CUDA interface：
Are there any requirements for vs, Python and CUDA versions?
My vs version is vs2015, Python version is Python 3.5.4, and CUDA version is 10.0.

How to automatically set the backends?

Is there any way to automatically set the backends (cpu or gpu)?
To merge the two Function into one and it will choose the proper backend according to what device we are using.

Broken link in README

Looks like the link in the README to
http://pytorch.org/docs/master/notes/cpp-extensions.html

Is broken.

The same codes got different results on different GPU devices.

Hello.

I wrote my test codes as follow:
test.py

a = torch.zeros((1), dtype=torch.int)
a = a.cuda(0)
x = test_cuda.func(a)
print(x)

cuda.cpp

#include <torch/torch.h>

void func_wrapper(int* a);
at::Tensor func(at::Tensor a)
{
    func_wrapper(a.data<int>());
    return a;
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("func", &func, "func");
}

cuda_kernel.cu

#include <ATen/ATen.h>
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void func_kernel(int* __restrict__ a)
{
    a[0] = 4;
}

void func_wrapper(int* a)
{
    func_kernel<<<1,1>>>(a);
}

When I used a = a.cuda(0) in test.py. I got expected result:

tensor([4], device='cuda:0', dtype=torch.int32)

But when I used a = a.cuda(3) (I have multiple GPUs). I got

tensor([0], device='cuda:3', dtype=torch.int32)

The result tensor was tensor([0]). Why?

Thanks a lot.

OS: Ubuntu 14
PyTorch version: torch-nightly 1.0.0.dev20190219
How you installed PyTorch (conda, pip, source):
I compiled and ran the code in a dockr container. The docker image was ufoym/deepo:pytorch-py36-cu90
GPU models and configuration: 4 GeForce GTX TITANs

python benchmark cpp has segmentation fault

PyTorch GitHub Issues Guidelines

OS: x86_64-redhat-linux
PyTorch version: 1.3.1
How you installed PyTorch (conda, pip, source): pip
Python version:python3.6
CUDA/cuDNN version:CUDA 10.0
GPU models and configuration:
GCC version (if compiling from source):
gcc 4.8.5
In addition, including the following information will also be very helpful for us to diagnose the problem:
A script to reproduce the bug. Please try to provide as minimal of a test case as possible.
Error messages and/or stack traces of the bug
Context around what you are trying to do
=====================

python benchmark.py cpp
Segmentation fault

compilation error on windows

for cpp extension， i got the following error：

lltm.obj : error LNK2001: 无法解析的外部符号 __imp_THPVariableClass
lltm.obj : error LNK2001: 无法解析的外部符号 "struct _object * __cdecl THPVariable_Wrap(struct torch::autograd::Variable)" (?THPVariable_Wrap@@YAPEAU_object@@UVariab
le@autograd@torch@@@z)

for cuda extension， i got the following error：

error: Don't know how to compile lltm_cuda_kernel.cu to build\temp.win-amd64-3.6\Release\lltm_cuda_kernel.obj

and i step into build_extension()， and found that warp_compile did not called

-------------update---------------------
issue information:

OS: windows 10 enterprise build 14393
PyTorch version: 0.4.0a0+59d1d17
How you installed PyTorch (conda, pip, source): source
Python version: anaconda 5.1 , python 3.6.4
CUDA/cuDNN version: 9.1.85/7.1
GPU models and configuration: GTX 1050Ti
GCC version (if compiling from source): visual studio 2017 toolset 14.11

additional information:

i modified the setup.py in cpp directory as follows:

setup(
name='lltm_cpp',
ext_modules=[
CppExtension(name='lltm_cpp',
sources = ['lltm.cpp'],
library_dirs=[r'C:\ProgramData\Anaconda3\Lib\site-packages\torch\lib'],
libraries=['ATen', 'shm']),
],
cmdclass={
'build_ext': BuildExtension
})

CUDA compilation fails

OS: Ubuntu 18.04
PyTorch version: 1.1.0
How you installed PyTorch (conda, pip, source): pip
Python version: 3.6.6
CUDA/cuDNN version: 9.0/7.4.1
GPU models and configuration: GeForce GTX 1050 Ti, driver 390.132
GCC version (if compiling from source): 7.5.0

Compiling the cpp version is ok, but the cuda python setup.py install fails.

The output log is quite long so I have attached it, and I put the end of the log below.

Any ideas?

/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/pytypes.h:923:28:   required from ‘pybind11::str pybind11::str::format(Args&& ...) const [with Args = {pybind11::object&, const pybind11::handle&}]’
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/pybind11.h:1401:51:   required from here
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2108:44: error: no matching function for call to ‘collect_arguments(pybind11::object&, const pybind11::handle&)’
     return detail::collect_arguments<policy>(std::forward<Args>(args)...).call(derived().ptr());
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2087:1: note: candidate: template<pybind11::return_value_policy policy, class ... Args, class> pybind11::detail::simple_collector<policy> pybind11::detail::collect_arguments(Args&& ...)
 simple_collector<policy> collect_arguments(Args &&...args) {
 ^~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2087:1: note:   template argument deduction/substitution failed:
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2094:1: note: candidate: template<pybind11::return_value_policy policy, class ... Args, class> pybind11::detail::unpacking_collector<policy> pybind11::detail::collect_arguments(Args&& ...)
 unpacking_collector<policy> collect_arguments(Args &&...args) {
 ^~~~~~~~~~~~~~~~~
/home/tom/miniconda3/envs/venv/lib/python3.6/site-packages/torch/include/pybind11/cast.h:2094:1: note:   template argument deduction/substitution failed:
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1

ImportError: No module named 'lltm_cuda'

Hi,Peter
CMD: python3 jit2.py
no result,help, pls ...
Loading extension module lltm_cuda...
Traceback (most recent call last):
File "jit2.py", line 24, in
verbose=True
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 645, i
n load
is_python_module)
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 825, i
n _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "E:\Python36\lib\site-packages\torch\utils\cpp_extension.py", line 965, i
n _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "E:\Python36\lib\imp.py", line 297, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'lltm_cuda'

Runtime error: undefined symbol

OS: 16.04
PyTorch version: 0.4.1
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6
CUDA/cuDNN version: 9.0
GPU models and configuration: Tesla K80

I define my own custom opt:

// sigmoid_cuda_kernal.cu
namespace {
  template <typename scalar_t>
  __device__ __forceinline__ scalar_t sigmoid(scalar_t z) {
    return 1.0 / (1.0 + exp(-z));
  }

  template <typename scalar_t>
  __device__ __forceinline__ scalar_t d_sigmoid(scalar_t z) {
    return (1.0 - z) * z;
  }

  template <typename scalar_t>
  __global__ void sigmoid_cuda_forward_kernel(
      const scalar_t* __restrict__ input,
      scalar_t* __restrict__ output) {
    const int index = blockIdx.x * blockDim.x + blockIdx.y;
    output[index] = sigmoid(input[index]);
  }

  template <typename scalar_t>
  __global__ void sigmoid_cuda_backward_kernel(
      const scalar_t* __restrict__ grad_output,
      const scalar_t* __restrict__ output,
      scalar_t* __restrict__ new_grad_output) {
    const int index = blockIdx.x * blockDim.x + blockIdx.y;
    new_grad_output[index] = d_sigmoid(output[index] * grad_output[index]);
  }
} // namespace

at::Tensor sigmoid_cuda_forward(
    at::Tensor input) {
  auto output = at::zeros_like(input);
  const dim3 blocks(input.size(0), input.size(1));
  const int threads = 1;

  AT_DISPATCH_FLOATING_TYPES(input.type(), "sigmoid_forward_cuda", ([&] {
    sigmoid_cuda_forward_kernel<scalar_t><<<blocks, threads>>>(
      input.data<scalar_t>(),
      output.data<scalar_t>());
  }));

  return output;
}

at::Tensor sigmoid_cuda_backward(
    at::Tensor grad_output,
    at::Tensor output) {
  auto new_grad_output = at::zeros_like(grad_output);
  const dim3 blocks(grad_output.size(0), grad_output.size(1));
  const int threads = 1;

  AT_DISPATCH_FLOATING_TYPES(grad_output.type(), "sigmoid_backward_cuda", ([&] {
    sigmoid_cuda_backward_kernel<scalar_t><<<blocks, threads>>>(
      grad_output.data<scalar_t>(),
      output.data<scalar_t>(),
      new_grad_output.data<scalar_t>());
  }));

  return new_grad_output;
}

And the cpp wrapper is as follow:

// sigmoid_cuda.cpp
at::Tensor sigmoid_cuda_forward(
    const at::Tensor& input);

at::Tensor sigmoid_cuda_backward(
    const at::Tensor& grad_output,
    const at::Tensor& output);

#define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)

at::Tensor sigmoid_forward(
    const at::Tensor& input) {
  CHECK_INPUT(input);
  return sigmoid_cuda_forward(input);
}

at::Tensor sigmoid_backward(
    const at::Tensor& grad_output,
    const at::Tensor& output) {
  CHECK_INPUT(grad_output);
  CHECK_INPUT(output);
  return sigmoid_cuda_backward(
    grad_output,
    output);
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("forward", &sigmoid_forward, "sigmoid forward (CUDA)");
  m.def("backward", &sigmoid_backward, "sigmoid backward (CUDA)");
}

The compilation process is successful.

running install
running bdist_egg
running egg_info
writing sigmoid_cuda_linear_cpp.egg-info/PKG-INFO
writing dependency_links to sigmoid_cuda_linear_cpp.egg-info/dependency_links.txt
writing top-level names to sigmoid_cuda_linear_cpp.egg-info/top_level.txt
reading manifest file 'sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt'
writing manifest file 'sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
creating build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/linear_cpp.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
copying build/lib.linux-x86_64-3.6/sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating stub loader for sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/sigmoid_cuda.py to sigmoid_cuda.cpython-36.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying sigmoid_cuda_linear_cpp.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
__pycache__.sigmoid_cuda.cpython-36: module references __file__
creating 'dist/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
removing '/home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg' (and everything under it)
creating /home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
Extracting sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg to /home/zhangzhi/anaconda3/lib/python3.6/site-packages
sigmoid-cuda-linear-cpp 0.0.0 is already the active version in easy-install.pth

Installed /home/zhangzhi/anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg
Processing dependencies for sigmoid-cuda-linear-cpp==0.0.0
Finished processing dependencies for sigmoid-cuda-linear-cpp==0.0.0

But when I import it, things will go wrong.

ImportError: /home/.../anaconda3/lib/python3.6/site-packages/sigmoid_cuda_linear_cpp-0.0.0-py3.6-linux-x86_64.egg/sigmoid_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _Z20sigmoid_cuda_forwardRKN2at6TensorE

Compilation issues on OS X

Setup:
Latest OS X and pytorch built from master (no GPU).

First error comes from torch.cuda not being present: https://github.com/pytorch/pytorch/blob/abd8501020d16e9aa12fa60dfd38ed70b8d7b71e/torch/utils/cpp_extension.py#L45. I manually set it to None.

The next one is related to flags, if I try: python setup.py install I get the following error:

fatal error: 'atomic' file not found
#include <atomic>
         ^~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1

That can be fixed by passing: CFLAGS='-stdlib=libc++'.

Next problem comes when I try to import:

ImportError: dlopen(/Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so, 2): Symbol not found: _THPVariableClass
  Referenced from: /Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so
  Expected in: flat namespace
 in /Users/michael/miniconda3/lib/python3.6/site-packages/lltm_cpp-0.0.0-py3.6-macosx-10.7-x86_64.egg/lltm_cpp.cpython-36m-darwin.so

Compiler interprets fmax in lltm_cuda_kernel.cu device function as std::fmax

I cloned the repository, and the CPU version compiles, but I get the following error when running python setup.py install in the cuda folder.

running install
running bdist_egg
running egg_info
creating lltm_cuda.egg-info
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing lltm_cuda.egg-info/PKG-INFO
writing top-level names to lltm_cuda.egg-info/top_level.txt
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.5
gcc -pthread -B /home/mantas/anaconda3/envs/pytorch04/compiler_compat -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.5/lltm_cuda.o -DTORCH_EXTENSION_NAME=lltm_cuda -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/usr/local/cuda-9.0/bin/nvcc -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/TH -I/home/mantas/anaconda3/envs/pytorch04/lib/python3.5/site-packages/torch/lib/include/THC -I/usr/local/cuda-9.0/include -I/home/mantas/anaconda3/envs/pytorch04/include/python3.5m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.5/lltm_cuda_kernel.o -DTORCH_EXTENSION_NAME=lltm_cuda --compiler-options '-fPIC' -std=c++11
lltm_cuda_kernel.cu(54): error: calling a host function("std::fmax<double, float> ") from a global function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel ") is not allowed

lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00002819_00000000-6_lltm_cuda_kernel.cpp1.ii".
error: command '/usr/local/cuda-9.0/bin/nvcc' failed with exit status 1

I'm using PyTorch 0.4.0 installed via conda a few weeks ago, Python 3.5, CUDA 9.0, cuDNN 7.1.4, and GCC 6.4.0.

CUDA extension compilation failed on MacOS 10.13.6 with Pytorch 1.5.0

When submitting a bug report, please include the following information (where relevant):

OS: Mac SO 10.13.6
PyTorch version: 1.5.0
How you installed PyTorch (conda, pip, source): source
Python version: 3.7.0
CUDA/cuDNN version: 10.0, 7.4
GPU models and configuration: Nvidia TitanV
GCC version (if compiling from source): AppleClang++ 9.0

In addition, including the following information will also be very helpful for us to diagnose the problem:

A script to reproduce the bug. Please try to provide as minimal of a test case as possible.

cd cuda
python3 setup.py develop

Error messages and/or stack traces of the bug

[2/2] /usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/site-packages/torch/include -I/usr/local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/site-packages/torch/include/TH -I/usr/local/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c -c /Users/tomheaven/Downloads/extension-cpp-master/cuda/lltm_cuda_kernel.cu -o /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=sm_70 -std=c++14
FAILED: /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o 
/usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.7/site-packages/torch/include -I/usr/local/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.7/site-packages/torch/include/TH -I/usr/local/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/include/python3.7m -c -c /Users/tomheaven/Downloads/extension-cpp-master/cuda/lltm_cuda_kernel.cu -o /Users/tomheaven/Downloads/extension-cpp-master/cuda/build/temp.macosx-10.13-x86_64-3.7/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=sm_70 -std=c++14
/usr/local/lib/python3.7/site-packages/torch/include/c10/util/variant.h(2241): error: parameter pack "Ts" was referenced but not expanded

The error seems to be from pytorch's header file variant.h(2241).

template <typename... Ts>
  class variant {
    static_assert(0 < sizeof...(Ts),
                  "variant must consist of at least one alternative.");

    static_assert(lib::all<!std::is_array<Ts>::value...>::value,
                  "variant can not have an array type as an alternative.");

    static_assert(lib::all<!std::is_reference<Ts>::value...>::value,
                  "variant can not have a reference type as an alternative.");

    static_assert(lib::all<!std::is_void<Ts>::value...>::value,
                  "variant can not have a void type as an alternative.");

    public:
    template <
        typename Front = lib::type_pack_element_t<0, Ts...>, // Line 2241. The error is in this line.
        lib::enable_if_t<std::is_default_constructible<Front>::value, int> = 0>
    inline constexpr variant() noexcept(
        std::is_nothrow_default_constructible<Front>::value)
        : impl_(in_place_index_t<0>{}) {}

Discussion about cuda kernel

Hello,

this is more a thread discussion than a real issue, but I've been working on the cuda kernel readability.
And pytorch actually provides very nice way of presenting tensor data for kernels as if it was still a multidimensional vector.

see here for a working prototype : https://github.com/ClementPinard/extension-cpp/blob/deviceTensorExperiments/cuda/lltm_cuda_kernel.cu

Essentially, I designed a simple convertor from at::Tensor to THCDeviceTensor<scalar_t, 2, size_t, RestrictPtrTraits>

The conversion is not very pretty, but it allows us to write more readable memory accesses in kernels while still doing eventually the exact same thing (even the __restricted__ keyword is kept)

Let's look at the current code for forward :

template <typename scalar_t>
__global__ void lltm_cuda_forward_kernel(
    const scalar_t* __restrict__ gates,
    const scalar_t* __restrict__ old_cell,
    scalar_t* __restrict__ new_h,
    scalar_t* __restrict__ new_cell,
    scalar_t* __restrict__ input_gate,
    scalar_t* __restrict__ output_gate,
    scalar_t* __restrict__ candidate_cell,
    size_t state_size) {
  const int column = blockIdx.x * blockDim.x + threadIdx.x;
  const int index = blockIdx.y * state_size + column;
  const int gates_row = blockIdx.y * (state_size * 3);
  if (column < state_size) {
    input_gate[index] = sigmoid(gates[gates_row + column]);
    output_gate[index] = sigmoid(gates[gates_row + state_size + column]);
    candidate_cell[index] = elu(gates[gates_row + 2 * state_size + column]);
    new_cell[index] =
        old_cell[index] + candidate_cell[index] * input_gate[index];
    new_h[index] = tanh(new_cell[index]) * output_gate[index];
  }
}

the columnand index are kinda hard to figure out. It actually use the fact that blockDim.y is batch size and thus BlockIdx.y the batch index. column is then the index in the state and index is batch_idx * batch_stride + column while gates_row is the first index of the gates in that particular element of the batch, because its batch stride is thrice as much.

Now my code proposition :

template <typename scalar_t>
__global__ void lltm_cuda_forward_kernel(
    const dTensor2R gates,
    const dTensor2R old_cell,
    dTensor2R new_h,
    dTensor2R new_cell,
    dTensor2R input_gate,
    dTensor2R output_gate,
    dTensor2R candidate_cell,
    size_t state_size) {
  const int n = blockIdx.y; //batch index
  CUDA_KERNEL_LOOP(c, state_size) {
    input_gate[n][c] = sigmoid((scalar_t) gates[n][c]);
    output_gate[n][c] = sigmoid((scalar_t) gates[n][c + state_size]);
    candidate_cell[n][c] = elu((scalar_t) gates[n][c + 2 * state_size]);
    new_cell[n][c] =
        old_cell[n][c] + candidate_cell[n][c] * input_gate[n][c];
    new_h[n][c] = tanh((scalar_t) new_cell[n][c]) * output_gate[n][c];
  }
}

I use dTensor2Rthat defined as THCDeviceTensor<scalar_t, 2, size_t, RestrictPtrTraits> in a macro above.
Besides using the strided loop CUDA_KERNEL_LOOP (just for the sake of good practices), we now only need to compute n which is explicetely the batch index and c which is the column from above.
every relevant value can now be accessed with tensor[n][c + shift] making it very similar to an actual 2D array.

I tested my code on master (from a few days) and it works for both check.py and grad_check.py . It does not need pytorch source code, only the compiled binaries and the headers.

Is this proposition legit ? I feel like it could be good way of letting people write cuda with more complicated ND-tensors (like 4D tensors for regular feature maps) without all the complex indexing stuff. And if so, that could be a good reason for letting a more use friendly method for at::Tensor to deviceTHCTensor conversion being written.

Compiler error /cuda/setup.py in MacOS

Hello, I want to use PyTorch Geometric but "pip install --upgrade torch-scatter" failed, then I want to try to build extension-cpp, but still failed, the following is the information of my system and log

System

OS: MacOS 10.13.6
PyTorch version: 1.1.0
How you installed PyTorch (conda, pip, source): source
Python version: Anaconda python 3.6.4
CUDA/cuDNN version: CUDA 10.1
GPU models and configuration: GeForce GTX 1080
GCC version (if compiling from source): gcc version 8.3.0 (Homebrew GCC 8.3.0)

Log

running install
running bdist_egg
running egg_info
creating lltm_cuda.egg-info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.7-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.macosx-10.7-x86_64-3.6
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/kiwee/anaconda3/include -arch x86_64 -I/Users/kiwee/anaconda3/include -arch x86_64 -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/TH -I/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/Users/kiwee/anaconda3/include/python3.6m -c lltm_cuda.cpp -o build/temp.macosx-10.7-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -std=c++11
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:3:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:48:20: error: explicit specialization of
non-template struct 'hash'
template <> struct hashc10::DeviceType {
^ ~~~~~~~~~~~~~~~~~
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:21: error: expected '(' for
function-style cast or type construction
return std::hash()(static_cast(k));
~~~~~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:25: error: expected '(' for
function-style cast or type construction
return std::hash()(static_cast(k));
~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:50:27: error: expected expression
return std::hash()(static_cast(k));
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/Exception.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/StringUtil.h:5:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:52:12: error: no member named 'stod' in
namespace 'std'
using std::stod;
~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:53:12: error: no member named 'stoi' in
namespace 'std'; did you mean 'atoi'?
using std::stoi;
~~~~~^~~~
atoi
/usr/include/c++/4.2.1/cstdlib:113:11: note: 'atoi' declared here
using ::atoi;
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/Exception.h:5:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/StringUtil.h:5:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:54:12: error: no member named 'stoull'
in namespace 'std'
using std::stoull;
~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/string_utils.h:55:12: error: no member named
'to_string' in namespace 'std'
using std::to_string;
~~~~~^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:6:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:103:8: error: explicit specialization of
non-template struct 'hash'
struct hashc10::Device {
^ ~~~~~~~~~~~~~
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Device.h:103:8: error: redefinition of 'hash'
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/DeviceType.h:48:20: note: previous definition is here
template <> struct hashc10::DeviceType {
^
In file included from lltm_cuda.cpp:1:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/ATen/ATen.h:3:
In file included from /Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/core/Allocator.h:7:
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:42:8: error: no template named
'unique_ptr' in namespace 'std'
std::unique_ptr<void, DeleterFnPtr> ctx_;
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:66:8: error: no template named
    'unique_ptr' in namespace 'std'
std::unique_ptr<void, DeleterFnPtr>&& move_context() {
~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:67:17: error: no member named 'move'
    in namespace 'std'
  return std::move(ctx_);
         ~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:71:17: error: no member named
    'unique_ptr' in namespace 'std'
  ctx_ = std::unique_ptr<void, DeleterFnPtr>(ctx_.release(), new_deleter);
         ~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:71:32: error: expected '(' for
    function-style cast or type construction
  ctx_ = std::unique_ptr<void, DeleterFnPtr>(ctx_.release(), new_deleter);
                         ~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:107:54: error: no type named
    'nullptr_t' in namespace 'std'
inline bool operator==(const UniqueVoidPtr& sp, std::nullptr_t) noexcept {
                                              ~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:110:29: error: no type named
    'nullptr_t' in namespace 'std'
inline bool operator==(std::nullptr_t, const UniqueVoidPtr& sp) noexcept {
                     ~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:113:54: error: no type named
    'nullptr_t' in namespace 'std'
inline bool operator!=(const UniqueVoidPtr& sp, std::nullptr_t) noexcept {
                                              ~~~~~^
/Users/kiwee/anaconda3/lib/python3.6/site-packages/torch/include/c10/util/UniqueVoidPtr.h:116:29: error: no type named
    'nullptr_t' in namespace 'std'
inline bool operator!=(std::nullptr_t, const UniqueVoidPtr& sp) noexcept {
                     ~~~~~^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
error: command 'gcc' failed with exit status 1

How to write cuda code of the multilayer units

This tutorials helped me to write a single layer unit with CUDA code.
But how to write CUDA code of the multilayer units, like torch/nn/_functions/rnn.py 281?

output, hy, cy, reserve, new_weight_buf = torch._cudnn_rnn(
           input, weight_arr, weight_stride0,
           flat_weight,
           hx, cx,
           mode, hidden_size, num_layers,
           batch_first, dropout, train, bool(bidirectional),
           list(batch_sizes.data) if variable_length else (),
           dropout_ts)

I have achieved the same results by using the template of AutogradRNN, i.e., torch/nn/_functions/rnn.py 212.

def AutogradRNN(mode, input_size, hidden_size, num_layers=1, batch_first=False,
                dropout=0, train=True, bidirectional=False, variable_length=False,
                dropout_state=None, flat_weight=None):

But gpu utilization was too low and speed was too slow. Perhaps because each single layer unit is called individually, which involve launch of a CUDA kernel. So I want to rewrite multilayer units in CUDA and fuse particular groups of single layer. Can you provide a boilerplate?

Consistency problem with check and modules regarding bias

The problem only occurs on pytorch master, because it's backprop engine is less compliant :
when running benchmark.py cpp (or cuda) :

Traceback (most recent call last):
  File "benchmark.py", line 43, in <module>
    (new_h.sum() + new_C.sum()).backward()
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [384] but got [1, 384]

This is due to modules bias parameter to be of size 3 * state_size while the backward outputs a tensor of size 1 x 3 * state_size . The problem is still here for torch 0.4.0, but the backprop engine doesn't complaint as the number of elements is the same.

So the solution could be to remove the keepdim=True in the d_bias computing e.g. here (but it's the same for python baseline, cpp and cuda)

But then you get the opposite error message when running check.py and grad_check.py :

Traceback (most recent call last):
  File "check.py", line 107, in <module>
    check_backward(variables, options.cuda, options.verbose)
  File "check.py", line 53, in check_backward
    (baseline_values[0] + baseline_values[1]).sum().backward()
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [1, 15] but got [15]

This is because now the bias given to the function is of size 1 x 15 !

The solution is pretty simple, but needs to decide on what to do :

either make bias parameter in every nn module dimension 1 x ...
or squeeze bias in check.py and grad_check.py and remove the keepdim=True arguments when computing d_bias sums.

Time using CUDA is more as compared to CPU

I am running the same example on google colab and when I checked the time using benchmark script the time in CPU is comparatively less than that of Cuda which is surprising. Can you please provide an explanation to it

Multi GPU Support

Hi,

do CUDA extensions currently support MultiGPUs? Putting all tensors onto device torch.device('cuda:1') in benchmark.py either yields

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /.../pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70

when trying to print out the resulting tensors (e.g. print(new_h)) or prints out only zeros (e.g. print(new_h.cpu().tolist())).

Compiler error /cuda/setup.py

Hello,

the compilation of the setup.py in cpp is successful but, for /cuda/setup.py I get the following compile error. Therefore I would like to ask you, if you have an idea what my mistake could be.

Best regards

System:

OS: Ubuntu 18.04.1 LTS
PyTorch version: 1.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6.8
CUDA/cuDNN version: 10.0
GPU models and configuration: GeForce GTX 1080 Ti
GCC version (if compiling from source): 7.3.0

Error log:

rrunning install
running bdist_egg
running egg_info
writing lltm_cuda.egg-info/PKG-INFO
writing dependency_links to lltm_cuda.egg-info/dependency_links.txt
writing top-level names to lltm_cuda.egg-info/top_level.txt
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
gcc -pthread -B /pizady/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from lltm_cuda.cpp:1:0:
/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include/torch/torch.h:7:2: warning: #warning "Including torch/torch.h for C++ extensions is deprecated. Please include torch/extension.h" [-Wcpp]
 #warning \
  ^~~~~~~
/usr/local/cuda/bin/nvcc -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/pizady/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/pizady/anaconda3/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
lltm_cuda_kernel.cu(54): error: calling a __host__ function("std::fmax<double, float> ") from a __global__ function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel<float> ") is not allowed

lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00000f0c_00000000-6_lltm_cuda_kernel.cpp1.ii".

Type error

I failed to compile the cuda code: python setup.py install and I'm rather surprised that this issues has not been brought up before. Here's the error message:

/usr/local/cuda/bin/nvcc -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include/TH -I/home/maxjiang/software/anaconda3/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/home/maxjiang/software/anaconda3/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options '-fPIC' -std=c++11
lltm_cuda_kernel.cu(54): error: calling a __host__ function("std::fmax<double, float> ") from a __global__ function("_NV_ANON_NAMESPACE::lltm_cuda_forward_kernel<float> ") is not allowed

lltm_cuda_kernel.cu(54): error: identifier "std::fmax<double, float> " is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00003be3_00000000-6_lltm_cuda_kernel.cpp1.ii".

Most of this is probably irrelevant except for gcc version:

OS: Ubuntu 16.04
PyTorch version: 0.4.1
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6.5
CUDA/cuDNN version: 9.2
GPU models and configuration: Titan X (Pascal)
GCC version (if compiling from source): 7.1.0

Here's my hacky fix that worked, by simply wrapping scalar_t around the doubles. Not sure this is the most elegant solution.

lltm_cuda_kernel.cu lines 26-29:

template <typename scalar_t>
__device__ __forceinline__ scalar_t elu(scalar_t z, scalar_t alpha = 1.0) {
  return fmax(scalar_t(0.0), z) + fmin(scalar_t(0.0), alpha * (exp(z) - scalar_t(1.0)));
}

The cuda version compiled with setup.py has import error. JIT works.

OS: Ubuntu 18.04 LTS
PyTorch version: 1.2.0
How you installed PyTorch: pip install torch torchvision
Python version: 3.6.8 (virtualenv)
CUDA/cuDNN version: cuda10.1
GPU models and configuration: GeForce GTX 1080
GCC version (if compiling from source):

In teneral: the cuda version in cuda/ has import error when compiled and installed by setup.py. The JIT version under cuda/ works.

After cloning the source code, go to the cuda/ directory run
python setup.py build_ext && python setup.py install
compilation finishes with warnings:

====== Compilation outputs. ======

running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.6
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/usr/include/python3.6m -I/home/yaoyu/p3pt/include/python3.6m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.6/lltm_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/local/cuda-10.1/bin/nvcc -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/TH -I/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/usr/include/python3.6m -I/home/yaoyu/p3pt/include/python3.6m -c lltm_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
lltm_cuda_kernel.cu: In lambda function:
lltm_cuda_kernel.cu:119:98: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(gates.type(), "lltm_forward_cuda", ([&] {
^
/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:78:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~~~~~~~~~
lltm_cuda_kernel.cu: In lambda function:
lltm_cuda_kernel.cu:152:94: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(X.type(), "lltm_forward_cuda", ([&] {
^
/home/yaoyu/p3pt/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:78:1: note: declared here
inline at::ScalarType scalar_type(const at::DeprecatedTypeProperties &t) {
^~~~~~~~~~~
creating build/lib.linux-x86_64-3.6
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/lltm_cuda.o build/temp.linux-x86_64-3.6/lltm_cuda_kernel.o -L/usr/local/cuda-10.1/lib64 -lcudart -o build/lib.linux-x86_64-3.6/lltm_cuda.cpython-36m-x86_64-linux-gnu.so

====== End of compilation outputs. ======

When importing lltm_cuda the following error happens

====== Import error. ======

import lltm_cuda

ImportError: /home/yaoyu/p3pt/lib/python3.6/site-packages/lltm_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cuda.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail37_typeMetaDataInstance_preallocated_32E

====== End of import error. ======

Try the jit.py in the cuda/ folder, the compilation outputs are as follows:

====== Compilation outputs from cuda/jit.py. ======

====== End of compilation outputs from cuda/jit.py. ======

Then cuda/jit.py will import lltm_cuda automatically, no error happens.

[feature request] Add support for '.' characters in TORCH_EXTENSION_NAME

I was trying to build a CUDAExtension, where the extension name was containing dots. The reason for that is that I wanted to install it as a submodule of the module I am building. For example, I have a top-level module foo and I want my CUDA Extension to be foo.bar.
There is a related discussion also here.

This is currently not possible using CUDAExtension and I can't think of a workaround right now. The obvious reason here is that the macros don't allow for '.' characters. I am attaching the output of the build process. A fast way to replicate this is to change the module name from lltm_cuda to foo.cuda in your CUDA lltm example.

In file included from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/pytypes.h:12:0,
                 from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/cast.h:13,
                 from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/attr.h:13,
                 from /usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/pybind11.h:43,
                 from /usr/local/lib/python3.5/dist-packages/torch/lib/include/torch/torch.h:6,
                 from neural_renderer/cuda/load_textures_cuda.cpp:1:
<command-line>:0:37: error: expected initializer before '.' token
/usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/detail/common.h:212:47: note: in definition of macro 'PYBIND11_CONCAT'
 #define PYBIND11_CONCAT(first, second) first##second
                                               ^
neural_renderer/cuda/load_textures_cuda.cpp:33:1: note: in expansion of macro 'PYBIND11_MODULE'
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
 ^
neural_renderer/cuda/load_textures_cuda.cpp:33:17: note: in expansion of macro 'TORCH_EXTENSION_NAME'
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
                 ^
<command-line>:0:37: error: expected initializer before '.' token
/usr/local/lib/python3.5/dist-packages/torch/lib/include/pybind11/detail/common.h:171:51: note: in definition of macro 'PYBIND11_PLUGIN_IMPL'
     extern "C" PYBIND11_EXPORT PyObject *PyInit_##name()
                                                   ^
neural_renderer/cuda/load_textures_cuda.cpp:33:1: note: in expansion of macro 'PYBIND11_MODULE'
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
 ^
neural_renderer/cuda/load_textures_cuda.cpp:33:17: note: in expansion of macro 'TORCH_EXTENSION_NAME'
 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
                 ^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

OS: Ubuntu 16.04
PyTorch version: 0.4.0
How you installed PyTorch: source
Python version: 3.5.2
CUDA/cuDNN version: 9.0

How to import lltm_cuda if I don't run python setup.py with install option?

Not implemented for type Half

I followed the tutorial to create my cpp_extension. But the module failed to do half-precision computation.

It gives:

RuntimeError: "op" not implemented for 'Half' (operator() at /op_cuda_kernel.cu:146)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fa561676687 in /miniconda3/lib/python3.7/site-packages/torch/lib/libc10.so)

When submitting a bug report, please include the following information (where relevant):

OS: Centos
PyTorch version: 1.3.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
CUDA/cuDNN version: 10/7
GPU models and configuration: 1080
GCC version (if compiling from source):7.4.0

Benchmark Python vs. C++ vs. CUDA by running python benchmark.py {py, cpp, cuda} [--cuda],

My environment is as follows:
win10
pytorch1.2.0
cuda10.0
Get the following error. How can I solve this problem：
Connected to pydev debugger (build 182.4323.49)
Using C:\Users\86969\AppData\Local\Temp\torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py:189: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the specified file。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Emitting ninja build file C:\Users\86969\AppData\Local\Temp\torch_extensions\lltm_cuda\build.ninja...
��Ϣ: ��ṩ��ģʽ�޷��ҵ��ļ��
Traceback (most recent call last):
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1664, in
main()
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\pycharm-community-2018.2.3\PyCharm Community Edition 2018.2.3\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "F:/git/extension-cpp/benchmark.py", line 28, in
from cuda.lltm import LLTM
File "F:\git\extension-cpp\cuda\lltm.py", line 6, in
from cuda.jit import lltm_cuda
File "F:\git\extension-cpp\cuda\jit.py", line 3, in
'lltm_cuda', ['F:\git\extension-cpp\cuda\lltm_cuda.cpp', 'F:\git\extension-cpp\cuda\lltm_cuda_kernel.cu'], verbose=True)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 658, in load
is_python_module)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 827, in _jit_compile
with_cuda=with_cuda)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 876, in _write_ninja_file_and_build
with_cuda=with_cuda)
File "F:\Anaconda3\envs\pytorch1.2.0\lib\site-packages\torch\utils\cpp_extension.py", line 1089, in _write_ninja_file
'cl']).decode().split('\r\n')
File "F:\Anaconda3\envs\pytorch1.2.0\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "F:\Anaconda3\envs\pytorch1.2.0\lib\subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

undefined symbol: THPVariableClass

OS: Ubuntu 16.04
PyTorch version: 0.4.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.6
CUDA/cuDNN version: 9.0
GPU models and configuration: Titan XP

I built an extension basing on this tutorial and it used to work. I was then doing some refactoring and fixes (in cuda/cpp code) and afterwards it started failing at runtime:

/home/jatentaki/anaconda3/lib/python3.6/site-packages/sort2_cuda-0.0.0-py3.6-linux-x86_64.egg/lltm_cpp.cpython-36m-x86_64-linux-gnu.so: undefined symbol: THPVariableClass

(both for CUDA and cpp versions). Then I tried if the original example still worked, and to my surprise, no longer.

Timeline:

My initial success was on some 0.4.0 pre-release source build for cuda8.0.
I broke it
Trying to troubleshoot, I reinstalled conda and torch for the release 0.4.0 version, with cuda9.0
Neither my code nor your original example work

I believe the error just means I am not linking against some static library, but I don't see when and how I could have introduced that change.

`None` Default Arguments

Hello and thanks for this great repository!

Can you give me a hint on how to implement default arguments? I tried the following so far but it does not work:

at::Tensor torch_func(at::Tensor tensor1, at::Tensor tensor2={}) {
  ...
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("torch_func", &torch_func, py::arg("tensor1"), py::arg("tensor2")={});
}

torch_func(tensor1, None)

I would be very grateful for help.

OS: macOS
PyTorch version: 0.4.0
How you installed PyTorch (conda, pip, source): source
Python version: 3.6
CUDA/cuDNN version: -
GPU models and configuration: -
GCC version (if compiling from source): 4.9

Error checking in CUDA example

Feature request:

Add an example of PyTorch's canonical error checking in the CUDA example

Calling functions from CUDA libraries

Thank you, @goldsborough , for this great tutorial, but could you tell me how to call functions from CUDA libraries? I want to use functions from <cusparse.h> and they all need cusparseHandle_t handle argument to be passed. But I don't know how to get it.

Half Tensor Dispatch compatibility ?

Hi, been tweaking the repo a bit, and wanted to try Half Tensor compatibility

So in the cuda, instead of AT_DISPATCH_FLOATING_TYPES here and here

I just changed the dispatch function to AT_DISPATCH_FLOATING_TYPES_AND_HALF, naively hoping that everything would work without changing anything else.

Unfortunately, I got this error (while only dispatching floating types work) :

lltm_cuda_kernel.cu(123): error: identifier "Half" is undefined

lltm_cuda_kernel.cu(157): error: identifier "Half" is undefined

Is there something i forgot to do ? apparently the Half is not recognized by the compiler like it is for float or double so maybe I need to include a header ? I tried #include <cuda_fp16.h> , #include <ATen/Half.h> and #include <ATen/Type.h> but it didn't work.

Thanks !

Clément

CUDA version Error

PyTorch GitHub Issues Guidelines

We like to limit our issues to bug reports and feature requests. If you have a question or would like help and support, please visit our forums: https://discuss.pytorch.org/

If you are submitting a feature request, please preface the title with [feature request].

When submitting a bug report, please include the following information (where relevant):

OS: Ubuntu 18.04
PyTorch version: 1.2.0
How you installed PyTorch (conda, pip, source): pip
Python version: 3.6.9
CUDA/cuDNN version: 10.1/7.4.2
GPU models and configuration:
GCC version (if compiling from source): gcc version 7.4.0

Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/lltm_cuda/build.ninja...
Building extension module lltm_cuda...
[1/3] /usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda_kernel.cu -o lltm_cuda_kernel.cuda.o
FAILED: lltm_cuda_kernel.cuda.o
/usr/local/cuda-9.0/bin/nvcc -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda_kernel.cu -o lltm_cuda_kernel.cuda.o
/bin/sh: 1: /usr/local/cuda-9.0/bin/nvcc: not found
[2/3] c++ -MMD -MF lltm_cuda.o.d -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp -o lltm_cuda.o
FAILED: lltm_cuda.o
c++ -MMD -MF lltm_cuda.o.d -DTORCH_EXTENSION_NAME=lltm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -isystem /usr/local/lib/python3.6/dist-packages/torch/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.6/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.6/dist-packages/torch/include/THC -isystem /usr/local/cuda-9.0/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp -o lltm_cuda.o
/home/youmi/PycharmProjects/roi_pooling-master/lltm_cuda.cpp:78:16: error: expected constructor, destructor, or type conversion before ‘(’ token
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
^
ninja: build stopped: subcommand failed.

I installed CUDA 10.1, why building lltm_cuda_kernel.cu automatically uses CUDA 9.0？

pytorch_scatter cuda error

OS: Windows 10
PyTorch version: 1.1.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
CUDA/cuDNN version: 10.1, V10.1.168
GPU models and configuration: Nvidia Geforce MX150
GCC version (if compiling from source): (MinGW.org GCC-8.2.0-3) 8.2.0

Dear all,
I get an error when compiling torch-scatter from source using .
Here is the link where my issue is described:
pyg-team/pytorch_geometric#307 (comment)

Does extension-cpp works with Pytorch 0.4?

I tried to run the cuda version on pytorch 0.4 but it said:

writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
reading manifest file 'lltm_cuda.egg-info/SOURCES.txt'
writing manifest file 'lltm_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'lltm_cuda' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include/TH -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/lib/python3.7/site-packages/torch/lib/include/THC -I/usr/local/cuda/include -I/groups/wall2-ilabt-iminds-be/cmsearch/users/amir/miniconda3/include/python3.7m -c lltm_cuda.cpp -o build/temp.linux-x86_64-3.7/lltm_cuda.o -DTORCH_EXTENSION_NAME=lltm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
lltm_cuda.cpp:1:29: fatal error: torch/extension.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

Do you have any idea how can I resolve this issue? (If I update to pytorch 1.xx it would be OK but I need 0.4x version)

Issues with using scatter.

Here is the gist of my code, along with the error in the comments.
https://gist.github.com/kris-singh/ce5fbdb8aa242ed3357d27c3e8fdea5f
I tried to debug the issue using lldb. I found that for the *this, src and index tensors all variables numel are correct. But when trying to get their sizes() it shows 1. I don't really understand why this is happening. Also in the error message should tell which tensor was half constructed.
When submitting a bug report, please include the following information (where relevant):

OS: MacOsX
PyTorch version: 1.0
How you installed PyTorch (conda, pip, source): conda
Python version: 3.7
CUDA/cuDNN version: None
GPU models and configuration: None
GCC version (if compiling from source): AppleClang 10.0.0.10001145

Also, an unrelated not do you know how I could get code completion and definition look. Right now I have to ag search the aten directory for finding what I want to use.

Custom C++ extension : torch/torch.h not found

I am following this tutorial to create a C++ extension for Pytorch. My C++ code is giving following error :

test.cpp:3:10: fatal error: torch/torch.h: No such file or directory
 #include <torch/torch.h>

How to get torch.h header file ? Is there some pytorch-dev version ?

[feature request] more extensive examples

It would be nice if there were more examples and especially examples with different types of inputs. Currently, the only input type is at::Tensor, but what about the other types? Specifically, I had a lot of trouble using at::Scalar and at::IntList as an input and instead used int and std::vector<int>. This might be an issue with pybind11 rather than pytorch extensions specifically, but for users with limited knowledge of pybind11 and aten, examples can be very helpful.

Compilation error of cuda/setup.py on windows

Hi, I tried to build the extension using your sc. I managed to build from cpp/setup.py and was able to import it. However, I just cannot build from cuda/setup.py. I tried the solution mentioned in other posts i.e. fmax to ::fmax, casting etc, none of them worked.

The error message:
E:/Miniconda/envs/pointnetpy/lib/site-packages/torch/include\torch/csrc/jit/argument_spec.h(161): error: member "torch::jit::ArgumentSpecCreator::DEPTH_LIMIT" may not be initialized
1 error detected in the compilation of "C:/Users/XIEYUA~1/AppData/Local/Temp/tmpxft_00002f00_00000000-10_lltm_cuda_kernel.cpp1.ii".
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\nvcc.exe' failed with exit status 1

OS: Windows10 Enterprise Version1809
PyTorch version:
How you installed PyTorch (conda, pip, source): conda
Python version: 1.1.0
CUDA/cuDNN version: 10.1
GPU models and configuration: GTX 1080Ti
GCC version (if compiling from source): N.A.

Any thoughts or suggestions would be greatly appreciated!

p.s.
FYI, I failed to compile using jit.py for both cpp and cuda version.

The error log:
Using C:\Users\XIEYUA1\AppData\Local\Temp\torch_extensions as PyTorch extensions root...
Emitting ninja build file C:\Users\XIEYUA1\AppData\Local\Temp\torch_extensions\lltm_cpp\build.ninja...
Building extension module lltm_cpp...
[1/2] cl /showIncludes -DTORCH_EXTENSION_NAME=lltm_cpp -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\TH -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\THC -IE:\Miniconda\envs\pointnetpy\Include -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c D:\Github\extension-cpp\cpp\lltm.cpp /Folltm.o
FAILED: lltm.o
cl /showIncludes -DTORCH_EXTENSION_NAME=lltm_cpp -DTORCH_API_INCLUDE_EXTENSION_H -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\TH -IE:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\THC -IE:\Miniconda\envs\pointnetpy\Include -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c D:\Github\extension-cpp\cpp\lltm.cpp /Folltm.o
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

cl : Command line warning D9002 : ignoring unknown option '-fPIC'
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\include\torch\csrc\api\include\torch/cuda.h(5): fatal error C1083: Cannot open include file: 'cstddef': No such file or directory
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 949, in _build_extension_module
check=True)
File "E:\Miniconda\envs\pointnetpy\lib\subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "jit.py", line 2, in
lltm_cpp = load(name="lltm_cpp", sources=["lltm.cpp"], verbose=True)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 644, in load
is_python_module)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 813, in _jit_compile
with_cuda=with_cuda)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 866, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "E:\Miniconda\envs\pointnetpy\lib\site-packages\torch\utils\cpp_extension.py", line 962, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension 'lltm_cpp'

AttributeError: 'module' object has no attribute 'Function'

I think there is an typing error in an example code in the tutorials Link on Pytorch website

import math
import torch

# Our module!
import lltm

class LLTMFunction(torch.nn.Function): # <-------------- Here
    @staticmethod
    def forward(ctx, input, weights, bias, old_h, old_cell):
        outputs = lltm.forward(input, weights, bias, old_h, old_cell)
        new_h, new_cell = outputs[:2]
        variables = outputs[1:] + [weights, old_cell]
        ctx.save_for_backward(*variables)

        return new_h, new_cell

torch.nn does not have module Function, It should be torch.autograd.Function

cuda compile error

PyTorch GitHub Issues Guidelines

/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:206:error: expansion pattern ‘pybind11::detail::negation<std::is_same<pybind11::detail::bools<pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ..., true>, pybind11::detail::bools<true, pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ...> > >::value’ contains no argument packs
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:215:error: template argument 1 is invalid
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:392:error: expansion pattern ‘pybind11::detail::negation<std::is_same<pybind11::detail::bools<pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ..., true>, pybind11::detail::bools<true, pybind11::detail::negation<std::is_base_of<pybind11::arg, Args> >::value ..., pybind11::detail::negation<std::is_same<pybind11::detail::kwargs_proxy, Args> >::value ...> > >::value’ contains no argument packs
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:1165:395:error: template argument 2 is invalid
/home/yexiang/anaconda3/lib/python3.6/site-packages/torch/lib/include/pybind11/pytypes.h:

I can't compile the cuda version. When torch/extension.h is included in the .cu file, this error is reported.
How to solve this problem.

Unrelated Issue

Learning the internals of pytorch. I need to test some c++ functions that I build with aten.
I have followed this example https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/CMakeLists.txt.
I don't want to write the CMakeList file for every source file that i write. Is there a way to link against the .dylib files using -l flag. I tried g++ --std=c++11 -l/anaconda3/envs/pytorch1.0/lib/python3.6/site-packages/torch/lib/libtorch.dylib -I/anaconda3/envs/pytorch1.0/lib/python3.6/sitepackages/torch/lib/include example-app.cpp -o example but this fails.

Could you let me know a way to do this? Or why is it preferred to make CmakeFiles for small examples

cc: @goldsborough

pytorch / extension-cpp Goto Github PK

extension-cpp's Introduction

C++/CUDA Extensions in PyTorch

Authors

extension-cpp's People

Contributors

Stargazers

Watchers

Forkers

extension-cpp's Issues

PyTorch GitHub Issues Guidelines

PyTorch GitHub Issues Guidelines

PyTorch GitHub Issues Guidelines

Recommend Projects

Recommend Topics

Recommend Org