Giter Club home page Giter Club logo

neural_kernels_code's People

Contributors

afang-story avatar sarafridov avatar vaishaal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

neural_kernels_code's Issues

Request for CIFAR10 dataset with ZCA

The link you provided has expired. Could you kindly provide an alternative link for downloading the dataset? Your assistance is greatly appreciated.

By the way, in the 'preprocess.py' file, it appears that there is no module named 'random_aug'.

Explanation of configs?

Hi, could you explain the configs in configs folder? For example, among myrtle10.yaml, myrtle10_exp.yaml, myrtle10_exp_all_chan.yaml and myrtle11_exp.yaml, which one is the most powerful kernel? I'm pretty confused with the naming.

public dataset

Hi, I can't access AWS services currently, would you provide the dataset via another approach like google drive?

RuntimeError: Could not compile function

I use your docker to run run_kernel_myrtle5.sh on a 4x 2080ti Ubuntu 18.04 workstation, and finally encounter the following error. Could you give me a hint on how to resolve this issue?

Current Count Is:  1084000
Current Count Is:  1085000
Data_q size start 1085764
  0%|                               | 0/2500000000 [00:00<?, ?it/s]Context already set..
Context already set..
Context already set..
Context already set..
STARTING KERNEL GEN HELP
STARTING KERNEL GEN HELP
STARTING KERNEL GEN HELP
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
STARTING KERNEL GEN HELP
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
Layer KWARGS: [{}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {}, {'store_norm': False}, {'store_norm': False}, {'precision': 'float64'}, {'store_norm': False, 'precision': 'float64'}, {'precision': 'float64', 'store_norm': True}, {'store_norm': False, 'precision': 'float64'}]
TC "conv3_input" was not explicitly compiled for inputs of sizes:
  torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
  torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
  torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
TC "conv3_input" was not explicitly compiled for inputs of sizes:
  torch.Size([8, 30, 30, 1]) torch.Size([8, 30, 30, 1])
....Generate implicit MappingOptions
E0618 20:39:27.390336    39 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
 source:
template<typename T> inline __device__ T floord(T n, T d) {
  return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))

#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif

#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)

// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
    ( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif

extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
  int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
  int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
  float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
  const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
  const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
  for (int c7 = 0; c7 <= 7; c7 += 1) {
    for (int c8 = 0; c8 <= 7; c8 += 1) {
      for (int c9 = 0; c9 <= 27; c9 += 1) {
        for (int c10 = 0; c10 <= 27; c10 += 1) {
          for (int c11 = t1; c11 <= 27; c11 += 8) {
            conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
            for (int c13 = 0; c13 <= 2; c13 += 1) {
              for (int c14 = 0; c14 <= 2; c14 += 1) {
                conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
              }
            }
          }
        }
      }
    }
  }
}
}
Process Process-4:
Traceback (most recent call last):
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
    kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
  File "/neural_kernels_code/kernel_gen.py", line 127, in forward
    prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
  File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
    return res.conv3_input(x,y)/(3*3)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
    tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
    implicit_compile(self, entry_point, *inputs)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
    entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.467238    37 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
 source:
template<typename T> inline __device__ T floord(T n, T d) {
  return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))

#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif

#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)

// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
    ( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif

extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
  int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
  int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
  float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
  const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
  const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
  for (int c7 = 0; c7 <= 7; c7 += 1) {
    for (int c8 = 0; c8 <= 7; c8 += 1) {
      for (int c9 = 0; c9 <= 27; c9 += 1) {
        for (int c10 = 0; c10 <= 27; c10 += 1) {
          for (int c11 = t1; c11 <= 27; c11 += 8) {
            conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
            for (int c13 = 0; c13 <= 2; c13 += 1) {
              for (int c14 = 0; c14 <= 2; c14 += 1) {
                conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
              }
            }
          }
        }
      }
    }
  }
}
}
Process Process-2:
Traceback (most recent call last):
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
    kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
  File "/neural_kernels_code/kernel_gen.py", line 127, in forward
    prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
  File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
    return res.conv3_input(x,y)/(3*3)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
    tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
    implicit_compile(self, entry_point, *inputs)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
    entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.564579    36 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
 source:
template<typename T> inline __device__ T floord(T n, T d) {
  return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))

#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif

#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)

// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
    ( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif

extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
  int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
  int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
  float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
  const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
  const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
  for (int c7 = 0; c7 <= 7; c7 += 1) {
    for (int c8 = 0; c8 <= 7; c8 += 1) {
      for (int c9 = 0; c9 <= 27; c9 += 1) {
        for (int c10 = 0; c10 <= 27; c10 += 1) {
          for (int c11 = t1; c11 <= 27; c11 += 8) {
            conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
            for (int c13 = 0; c13 <= 2; c13 += 1) {
              for (int c14 = 0; c14 <= 2; c14 += 1) {
                conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
              }
            }
          }
        }
      }
    }
  }
}
}
Process Process-1:
Traceback (most recent call last):
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
    kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
  File "/neural_kernels_code/kernel_gen.py", line 127, in forward
    prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
  File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
    return res.conv3_input(x,y)/(3*3)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
    tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
    implicit_compile(self, entry_point, *inputs)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
    entry_point, inputs, mapping_options)
RuntimeError: Could not compile function
E0618 20:39:27.591982    38 cuda_rtc.cc:251] Compilation failure for nvrtc(NVRTC_ERROR_INVALID_OPTION):
nvrtc: error: invalid value for --gpu-architecture (-arch)
 source:
template<typename T> inline __device__ T floord(T n, T d) {
  return n < 0 ? - (-n + d - 1)/d : n / d;
}
#define if_then_else(cond,a,b) ((cond) ? (a) : (b))

#ifndef __CUDACC_RTC__
// Can't include system dependencies with NVRTC
// Can't include cuda_fp16.h with NVRTC due to transitive system dependencies
#include <cuda_fp16.h>
#endif

#define inff __int_as_float(0x7f800000)
#define inf __longlong_as_double(0x7ff0000000000000LL)

// Before CUDA 9, syncwarp is a noop since warps are always synchronized.
#if (!defined(__clang__) && __CUDACC_VER_MAJOR__ < 9) || \
    ( defined(__clang__) && CUDA_VERSION < 9000)
inline __device__ void __syncwarp(unsigned mask = 0xFFFFFFFF) {}
#endif

extern "C" {
__global__ void conv3_input_8_30_1_30(int B, int H, int P, int W, float* pconv_output, const float* pX, const float* pY) {
  int b0 = blockIdx.x; int b1 = blockIdx.y; int b2 = blockIdx.z;
  int t0 = threadIdx.x; int t1 = threadIdx.y; int t2 = threadIdx.z;
  float (*conv_output)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)] = reinterpret_cast<float (*)[8][(30 + -2)][(30 + -2)][(30 + -2)][(30 + -2)]>(pconv_output);
  const float (*X)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pX);
  const float (*Y)[30][30][1] = reinterpret_cast<const float (*)[30][30][1]>(pY);
  for (int c7 = 0; c7 <= 7; c7 += 1) {
    for (int c8 = 0; c8 <= 7; c8 += 1) {
      for (int c9 = 0; c9 <= 27; c9 += 1) {
        for (int c10 = 0; c10 <= 27; c10 += 1) {
          for (int c11 = t1; c11 <= 27; c11 += 8) {
            conv_output[c7][c8][c9][c10][c11][t0] = (float)0.000000;
            for (int c13 = 0; c13 <= 2; c13 += 1) {
              for (int c14 = 0; c14 <= 2; c14 += 1) {
                conv_output[c7][c8][c9][c10][c11][t0] = (conv_output[c7][c8][c9][c10][c11][t0] + (X[c7][(c9 + c13)][(c10 + c14)][0]*Y[c8][(c11 + c13)][(t0 + c14)][0]));
              }
            }
          }
        }
      }
    }
  }
}
}
Process Process-3:
Traceback (most recent call last):
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/root/conda/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/neural_kernels_code/kernel_gen.py", line 351, in _kernel_gen_help
    kx = dnet.forward(x_b, y_b, gpu=gpu_idx, pp_net=net).cpu().numpy().squeeze()
  File "/neural_kernels_code/kernel_gen.py", line 127, in forward
    prev_norm = self.layers[0](x_b, x_b, **self.kwargs_list[0])
  File "/neural_kernels_code/tc_kernels.py", line 434, in conv3zp_input
    return res.conv3_input(x,y)/(3*3)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 348, in fun
    tc_def_name, *inputs, outputs=outputs, unchecked=unchecked)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 398, in __call__
    implicit_compile(self, entry_point, *inputs)
  File "/root/conda/lib/python3.6/site-packages/tensor_comprehensions/__init__.py", line 392, in implicit_compile
    entry_point, inputs, mapping_options)
RuntimeError: Could not compile function

After I use Ctrl+C to interrupt it, the following error gets printed out:

^CTraceback (most recent call last):
  File "run_train_eval_exp.py", line 156, in <module>
    main()
  File "run_train_eval_exp.py", line 51, in main
    K_train, K_test = generate_kernels(cfg, X_train, X_test)
  File "run_train_eval_exp.py", line 101, in generate_kernels
    K_train = kernel_gen.generate_kernel_parallel(cfg.KERNEL, X_train, X_train, num_gpus=cfg.SYSTEM.NUM_GPUS, symmetric=True, batch_size=cfg.SYSTEM.BATCH_SIZE, cache_path=cfg.SYSTEM.CACHE_PATH, float32=cfg.SYSTEM.FLOAT_32, extra_info={"kernel_type": "Train"})
  File "/neural_kernels_code/kernel_gen.py", line 438, in generate_kernel_parallel
    progress = done_q.get()
  File "/root/conda/lib/python3.6/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/root/conda/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/root/conda/lib/python3.6/multiprocessing/util.py", line 262, in _run_finalizers
    finalizer()
  File "/root/conda/lib/python3.6/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/root/conda/lib/python3.6/multiprocessing/queues.py", line 191, in _finalize_join
    thread.join()
  File "/root/conda/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/root/conda/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
^C

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.