neural_kernels_code's Issues
RuntimeError: Could not compile function
I use your docker to run run_kernel_myrtle5.sh
on a 4x 2080ti Ubuntu 18.04 workstation, and finally encounter the following error. Could you give me a hint on how to resolve this issue?
Current Count Is: 1084000
Current Count Is: 1085000
Data_q size start 1085764
0%| | 0/2500000000 [00:00<?, ?it/s]Context already set..
Context already set..
Context already set..
Context already set..
STARTING KERNEL GEN HELP
STARTING KERNEL GEN HELP
STARTING KERNEL GEN HELP
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
STARTING KERNEL GEN HELP
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
TC "conv3_input" was not explicitly compiled for inputs of sizes:
torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
E0618 20:39:27.390336 39 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
source:
template<typename T> inline __device__ T floord(T n, T d) {
return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))
#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif
#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)
// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif
extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
for (int c7 = 0; c7 <= 7; c7 += 1) {
for (int c8 = 0; c8 <= 7; c8 += 1) {
for (int c9 = 0; c9 <= 27; c9 += 1) {
for (int c10 = 0; c10 <= 27; c10 += 1) {
for (int c11 = t1; c11 <= 27; c11 += 8) {
conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
for (int c13 = 0; c13 <= 2; c13 += 1) {
for (int c14 = 0; c14 <= 2; c14 += 1) {
conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
}
}
}
}
}
}
}
}
}
Process Process-4:
Traceback (most recent call last):
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
File "/neural_kernels_code/kernel_gen.py", line 127, in forward
prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
return res.conv3_input(x,y)/(3*3)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
implicit_compile(self, entry_point, *inputs)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.467238 37 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
source:
template<typename T> inline __device__ T floord(T n, T d) {
return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))
#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif
#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)
// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif
extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
for (int c7 = 0; c7 <= 7; c7 += 1) {
for (int c8 = 0; c8 <= 7; c8 += 1) {
for (int c9 = 0; c9 <= 27; c9 += 1) {
for (int c10 = 0; c10 <= 27; c10 += 1) {
for (int c11 = t1; c11 <= 27; c11 += 8) {
conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
for (int c13 = 0; c13 <= 2; c13 += 1) {
for (int c14 = 0; c14 <= 2; c14 += 1) {
conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
}
}
}
}
}
}
}
}
}
Process Process-2:
Traceback (most recent call last):
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
File "/neural_kernels_code/kernel_gen.py", line 127, in forward
prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
return res.conv3_input(x,y)/(3*3)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
implicit_compile(self, entry_point, *inputs)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.564579 36 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
source:
template<typename T> inline __device__ T floord(T n, T d) {
return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))
#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif
#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)
// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif
extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
for (int c7 = 0; c7 <= 7; c7 += 1) {
for (int c8 = 0; c8 <= 7; c8 += 1) {
for (int c9 = 0; c9 <= 27; c9 += 1) {
for (int c10 = 0; c10 <= 27; c10 += 1) {
for (int c11 = t1; c11 <= 27; c11 += 8) {
conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
for (int c13 = 0; c13 <= 2; c13 += 1) {
for (int c14 = 0; c14 <= 2; c14 += 1) {
conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
}
}
}
}
}
}
}
}
}
Process Process-1:
Traceback (most recent call last):
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
File "/neural_kernels_code/kernel_gen.py", line 127, in forward
prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
return res.conv3_input(x,y)/(3*3)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
implicit_compile(self, entry_point, *inputs)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.591982 38 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
source:
template<typename T> inline __device__ T floord(T n, T d) {
return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))
#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif
#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)
// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif
extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
for (int c7 = 0; c7 <= 7; c7 += 1) {
for (int c8 = 0; c8 <= 7; c8 += 1) {
for (int c9 = 0; c9 <= 27; c9 += 1) {
for (int c10 = 0; c10 <= 27; c10 += 1) {
for (int c11 = t1; c11 <= 27; c11 += 8) {
conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
for (int c13 = 0; c13 <= 2; c13 += 1) {
for (int c14 = 0; c14 <= 2; c14 += 1) {
conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
}
}
}
}
}
}
}
}
}
Process Process-3:
Traceback (most recent call last):
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
File "/neural_kernels_code/kernel_gen.py", line 127, in forward
prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
return res.conv3_input(x,y)/(3*3)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
implicit_compile(self, entry_point, *inputs)
File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
After I use Ctrl+C to interrupt it, the following error gets printed out:
^CTraceback (most recent call last):
File "run_train_eval_exp.py", line 156, in <module>
main()
File "run_train_eval_exp.py", line 51, in main
K_train, K_test = generate_kernels(cfg, X_train, X_test)
File "run_train_eval_exp.py", line 101, in generate_kernels
K_train = kernel_gen.generate_kernel_parallel(cfg.KERNEL, X_train, X_train, num_gpus=cfg.SYSTEM.NUM_GPUS, symmetric=True, batch_size=cfg.SYSTEM.BATCH_SIZE, cache_path=cfg.SYSTEM.CACHE_PATH, float32=cfg.SYSTEM.FLOAT_32, extra_info={"kernel_type": "Train"})
File "/neural_kernels_code/kernel_gen.py", line 438, in generate_kernel_parallel
progress = done_q.get()
File "/root/conda/lib/python3.6/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/root/conda/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
finalizer()
File "/root/conda/lib/python3.6/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/root/conda/lib/python3.6/multiprocessing/queues.py", line 191, in _finalize_join
thread.join()
File "/root/conda/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/root/conda/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
^C
Request for CIFAR10 dataset with ZCA
The link you provided has expired. Could you kindly provide an alternative link for downloading the dataset? Your assistance is greatly appreciated.
By the way, in the 'preprocess.py' file, it appears that there is no module named 'random_aug'.
Could you put a license in this repo?
Hi,
Could you put a license (e.g., a MIT license) in this repo to make it officially open-source? I need to use your code for research purposes.
Thanks a lot!
Explanation of configs?
Hi, could you explain the configs in configs
folder? For example, among myrtle10.yaml
, myrtle10_exp.yaml
, myrtle10_exp_all_chan.yaml
and myrtle11_exp.yaml
, which one is the most powerful kernel? I'm pretty confused with the naming.
public dataset
Hi, I can't access AWS services currently, would you provide the dataset via another approach like google drive?
Code for The first row of Table 3
Hi!
Could you please provide code for the first row of Table 3 (Gaussian Kernel)? Thanks a lot!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.