Giter Club home page Giter Club logo

Comments (9)

ashawkey avatar ashawkey commented on June 7, 2024

What's your GPU hardware architecture?
Currently the code uses atomicAdd for __half, which is only available for a GPU with architecture >= 70.
A temporary solution is to comment out that function here and its use here, and make sure level_dim is even (but a minimal architecture of 60 is still needed for __half2).

from torch-ngp.

aoliao12138 avatar aoliao12138 commented on June 7, 2024

I met a similar error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "train_nerf.py", line 3, in <module>
    from nerf.network import NeRFNetwork
  File "/data/new_disk70/wangla/tmp/torch-ngp/nerf/network.py", line 9, in <module>
    from encoding import get_encoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/encoding.py", line 6, in <module>
    from hashencoder import HashEncoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/__init__.py", line 1, in <module>
    from .hashgrid import HashEncoder
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/hashgrid.py", line 8, in <module>
    from .backend import _backend
  File "/data/new_disk70/wangla/tmp/torch-ngp/hashencoder/backend.py", line 6, in <module>
    _backend = load(name='_hash_encoder',
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1124, in load
    return _jit_compile(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1337, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1449, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_hash_encoder': [1/3] :/usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
FAILED: hashencoder.cuda.o 
:/usr/local/cuda-11.3/bin/nvcc  -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/hashencoder.cu -o hashencoder.cuda.o 
/bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found
[2/3] c++ -MMD -MF bindings.o.d -DTORCH_EXTENSION_NAME=_hash_encoder -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.8/dist-packages/torch/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.8/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.8/dist-packages/torch/include/THC -isystem :/usr/local/cuda-11.3/include -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /data/new_disk70/wangla/tmp/torch-ngp/hashencoder/src/bindings.cpp -o bindings.o 
ninja: build stopped: subcommand failed.

even l comment out that 2 lines, still the same error occurs.

More info:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.10.2+cu113'

I am using RTX3090.

from torch-ngp.

ashawkey avatar ashawkey commented on June 7, 2024

@aoliao12138 The error message says /bin/sh: 1: :/usr/local/cuda-11.3/bin/nvcc: not found, have you included CUDA bin to your path? (e.g., export PATH="/usr/local/cuda/bin:$PATH")

from torch-ngp.

wangjksjtu avatar wangjksjtu commented on June 7, 2024

@ashawkey Thank you for the prompt reply!
My GPU is RTX 1080Ti - so the architecture is 61. It seems to work for me when comment that atomicAdd function. However, the following issues (compilation of fully fused network) appear:

  File "train_nerf.py", line 4, in <module>
    from nerf.network_ff import NeRFNetwork as NeRFNetwork_FF
  File "/home/wangjk/programs/torch-ngp/nerf/network_ff.py", line 10, in <module>
    from ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/__init__.py", line 1, in <module>
    from .ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/ffmlp.py", line 10, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/ffmlp/backend.py", line 16, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_ffmlp': [1/2] /home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
FAILED: ffmlp.cuda.o 
/home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(577): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: identifier "output_layout" is undefined
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: name followed by "::" must be a class or namespace name
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(60): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: identifier "wmma" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here
....
....
85 errors detected in the compilation of "/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu".
ninja: build stopped: subcommand failed.

Full log here:

Click to expand!
Traceback (most recent call last):
  File "train_nerf.py", line 4, in <module>
    from nerf.network_ff import NeRFNetwork as NeRFNetwork_FF
  File "/home/wangjk/programs/torch-ngp/nerf/network_ff.py", line 10, in <module>
    from ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/__init__.py", line 1, in <module>
    from .ffmlp import FFMLP
  File "/home/wangjk/programs/torch-ngp/ffmlp/ffmlp.py", line 10, in <module>
    from .backend import _backend
  File "/home/wangjk/programs/torch-ngp/ffmlp/backend.py", line 16, in <module>
    sources=[os.path.join(_src_path, 'src', f) for f in [
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1452, in _write_ninja_file_and_build_library
    error_prefix=f"Error building extension '{name}'")
  File "/home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1733, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension '_ffmlp': [1/2] /home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
FAILED: ffmlp.cuda.o 
/home/wangjk/anaconda3/envs/torch-ngp/bin/nvcc  -DTORCH_EXTENSION_NAME=_ffmlp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/wangjk/programs/torch-ngp/ffmlp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/TH -isystem /home/wangjk/anaconda3/envs/torch-ngp/lib/python3.7/site-packages/torch/include/THC -isystem /home/wangjk/anaconda3/envs/torch-ngp/include -isystem /home/wangjk/anaconda3/envs/torch-ngp/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61 --compiler-options '-fPIC' -O3 -Xcompiler=-mf16c -Xcompiler=-Wno-float-conversion -Xcompiler=-fno-strict-aliasing --extended-lambda --expt-relaxed-constexpr -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -std=c++14 -c /home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu -o ffmlp.cuda.o 
/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(243): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: explicit type is missing ("int" assumed)

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(344): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(577): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: identifier "output_layout" is undefined
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(396): error: name followed by "::" must be a class or namespace name
          detected during instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(60): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: identifier "wmma" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: too few arguments for alias template "std::conditional_t"
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(64): error: expected a ";"
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(67): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(68): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(69): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(88): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(90): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(96): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(101): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(102): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(108): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(119): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(119): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(165): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(168): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(169): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(170): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(213): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(217): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(218): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(219): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(229): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(229): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(252): error: name must be a namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(255): error: identifier "act_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(256): error: identifier "weights_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: type name is not allowed
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(257): error: identifier "result_frag" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(280): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(284): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(289): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(290): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(295): error: identifier "output_layout" is undefined
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(295): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(296): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(299): error: name followed by "::" must be a class or namespace name
          detected during:
            instantiation of "void kernel_mlp_fused<WIDTH,BLOCK_DIM_Z,N_ITERS,OUT_T,INFERENCE>(Activation, Activation, const __half *, const __half *, OUT_T *, OUT_T *, uint32_t, uint32_t, uint32_t, uint32_t, int) [with WIDTH=16, BLOCK_DIM_Z=1, N_ITERS=8, OUT_T=__half, INFERENCE=false]" 
(564): here
            instantiation of "void ffmlp_forward_cuda<WIDTH,INFERENCE>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, Activation, __half *, __half *) [with WIDTH=16U, INFERENCE=false]" 
(655): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(616): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: name followed by "::" must be a class or namespace name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: expected an identifier

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: "threads" has already been declared in the current scope

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: "shmem_size" has already been declared in the current scope

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(618): error: expected an identifier

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(619): error: parameter "activation" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(620): error: parameter "grad" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(621): error: variable "weights_second" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(622): error: parameter "backward_buffer" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(623): error: parameter "forward_buffer" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(624): error: parameter "grad_inputs" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(625): error: variable "weights_first" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(626): error: parameter "B" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(627): error: parameter "output_dim" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(628): error: parameter "num_layers" is not a type name

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(628): error: expected a ")"

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(597): warning: variable "weights_first" was declared but never referenced

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(598): warning: variable "weights_second" was declared but never referenced

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=16U]" 
(832): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=32U]" 
(833): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=64U]" 
(834): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=128U]" 
(835): here

/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu(606): warning: variable "threads" was declared but never referenced
          detected during instantiation of "void ffmlp_backward_cuda<WIDTH>(const __half *, const __half *, uint32_t, uint32_t, uint32_t, uint32_t, Activation, const __half *, __half *, __half *) [with WIDTH=256U]" 
(836): here

85 errors detected in the compilation of "/home/wangjk/programs/torch-ngp/ffmlp/src/ffmlp.cu".
ninja: build stopped: subcommand failed.

from torch-ngp.

ashawkey avatar ashawkey commented on June 7, 2024

@wangjksjtu ffmlp uses cutlass, which also requires architecture >= 70 (here), maybe you could remove the import and usage in python script to avoid using ffmlp, as it doesn't help very much now.

from torch-ngp.

wangjksjtu avatar wangjksjtu commented on June 7, 2024

yeah, that is what I am doing now! However, I cannot obtain decent performance. Any thoughts? see issue #5

from torch-ngp.

ashawkey avatar ashawkey commented on June 7, 2024

@wangjksjtu thanks for spotting the bug, I have fixed it!

from torch-ngp.

aoliao12138 avatar aoliao12138 commented on June 7, 2024

@ashawkey Thanks for your reply! I solved it.

from torch-ngp.

ashawkey avatar ashawkey commented on June 7, 2024

Closed for now.

from torch-ngp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.