jwfromm / riptide Goto Github PK

View Code? Open in Web Editor NEW

158.0 158.0 22.0 13.66 MB

Simple Training and Deployment of Fast End-to-End Binary Networks

License: Other

Jupyter Notebook 96.94% Python 2.95% Shell 0.02% Dockerfile 0.03% Starlark 0.01% CMake 0.05% Vim Script 0.01%

riptide's People

Stargazers

Watchers

riptide's Issues

Mismatch between Keras.predict and compiled TVM for imported Relay model

Hello,

So I have been able to train binary AlexNet (1A1W) to top-1 accuracy to 43.2%, top-5 accuracy to 66.0%, fairly close to the published result. But I have been having issue getting the compiled model to have the same (or at least similar) output to Keras model.predict.

My setup:
Tensorflow version: 2.1.0
Keras version: 2.3.1
Target/Host: llvm/llvm (x86)

import tensorflow as tf
import numpy as np
import tvm
import tvm.relay as relay
import tvm.contrib.graph_runtime as runtime
from riptide.get_models import get_model
from riptide.binary.binary_layers import Config, DQuantize, XQuantize
from keras.applications.imagenet_utils import preprocess_input, decode_predictions
from keras.preprocessing import image

config = Config(actQ=DQuantize, weightQ=XQuantize, bits=1, use_act=True, use_bn=False, use_maxpool=True)
with config:
    model = get_model('alexnet')
test_input = tf.keras.Input(shape=[224, 224, 3], batch_size=1, dtype='float32')
output = model(test_input)
model.load_weights('...../alexnet_1A1W/model.ckpt-1003838')

# import from keras
mod, params = relay.frontend.from_keras(
  model,
  shape={'input_1': [1, 3, 224, 224]},
  layout='NCHW')

target = 'llvm'
target_host = 'llvm'
with relay.transform.build_config(opt_level=2):
  graph, lib, params = relay.build(mod, target=target, target_host=target_host, params=params)

ctx = tvm.cpu()

img = image.load_img('cat.jpeg', target_size=(224,224))
x = image.img_to_array(img, data_format="channels_first")
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

x_nhwc = image.img_to_array(img)
x_nhwc = np.expand_dims(x_nhwc, axis=0)
x_nhwc = preprocess_input(x_nhwc)
resnet = tf.keras.applications.ResNet50()

module = runtime.create(graph, lib, ctx)
module.set_input(**params)
module.set_input(model.input_names[0], x)
module.run()

modout = module.get_output(0)
print(modout.shape)

print("Keras prediction: ", decode_predictions(model.predict(x_nhwc)))
print("Keras ResNet50 prediction: ", decode_predictions(resnet.predict(x_nhwc)))
print("TVM prediction: ", decode_predictions(modout.asnumpy()))

This results in the following:

Keras prediction:  [[('n01924916', 'flatworm', 0.48209077), ('n02877765', 'bottlecap', 0.26524794), ('n01734418', 'king_snake', 0.051874492), ('n04209239', 'shower_curtain', 0.032657914), ('n09256479', 'coral_reef', 0.014033702)]]
Keras ResNet50 prediction:  [[('n02123045', 'tabby', 0.7215433), ('n02124075', 'Egyptian_cat', 0.21909083), ('n02123159', 'tiger_cat', 0.047373313), ('n03223299', 'doormat', 0.002071013), ('n02127052', 'lynx', 0.0011057281)]]
TVM prediction:  [[('n01980166', 'fiddler_crab', 0.5203081), ('n01914609', 'sea_anemone', 0.23222995), ('n11939491', 'daisy', 0.09443407), ('n02317335', 'starfish', 0.03952333), ('n02319095', 'sea_urchin', 0.017155001)]]

I should note that if I change the input image, model.predict will likely change but the TVM compiled prediction seems to always hover around fiddler_crab and sea_anemone. Am I doing something wrong here? I could expect that the model could predict incorrectly, but Keras and TVM to be "equally incorrect".

Binary Convolution

Hi,

I've trained a binary model using alexnet, and I'm wondering that if binary convolution uses xnor-popcount operation to replace floating point multiply-accumulate in training phase or just in pure inference phase?

Another question is that I look into bitserial_conv2d.py in Riptide/tvm/topi/python/topi/x86:

def _conv(n, co, h, w, vh, vw, vc):
        b1b2 = (b1+b2).astype(out_dtype)
        if unipolar:
            return tvm.sum((tvm.popcount(
                data_vec[n, h, w, ci, vh*HSTR+dh, vw*WSTR+dw, b1].astype(out_dtype) &
                kernel_vec[co, ci, dh, dw, b2, vc].astype(out_dtype))  -
                            tvm.popcount(
                                data_vec[n, h, w, ci, vh*HSTR+dh, vw*WSTR+dw, b1].astype(out_dtype)
                                & ~kernel_vec[co, ci, dh, dw, b2, vc]).astype(out_dtype)) << b1b2,
                           axis=[ci, dh, dw, b1, b2])

        return tvm.sum((tvm.popcount(
            data_vec[n, h, w, ci, vh*HSTR+dh, vw*WSTR+dw, b1] &
            kernel_vec[co, ci, dh, dw, b2, vc])).astype(out_dtype) << b1b2,
                       axis=[ci, dh, dw, b1, b2])

I found that bipolar quantization allows xnor-popcount operation to replace floating point multiply-accumulate in the paper Riptide, but it is and-popcount operation in this file.

Why there is a difference? Or I reference a wrong file?

error:missing 1 required positional argument: 'previous_layer'

Hi, this is a nice work. An error occurred to me when I use train_imagenet.py to train a binary resnet18 model, the command I used:
python train_imagenet.py --model q_resnet18 --experiment 2A1W --gpus 0 --binary --bits 2 --model_dir ~/models

But I get the following error:

Traceback (most recent call last):
File "scripts/train_imagenet.py", line 196, in
app.run(main)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "scripts/train_imagenet.py", line 192, in main
tf.estimator.train_and_evaluate(classifier, train_spec, eval_spec)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 374, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1164, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1194, in _train_model_default
features, labels, ModeKeys.TRAIN, self.config)
File "/opt/conda/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1152, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "scripts/train_imagenet.py", line 108, in model_fn
model = get_model(FLAGS.model)
File "/root/riptide/riptide/get_models.py", line 37, in get_model
net = modelsname
File "/root/riptide/riptide/models/resnet18.py", line 30, in init
self.block1_bn1 = nn.BatchNormalization()
File "/root/riptide/riptide/binary/binary_layers.py", line 257, in BatchNormalization
return ShiftNormalization(*args, **kwargs)
TypeError: init() missing 1 required positional argument: 'previous_layer'

I use tf2.1 and install the tvm from you rpo, can you guide me what to do next?

erro

git clone --recursive [email protected]:jwfromm/riptide.git
是不是应该是这样
git clone --recursive https://@github.com:jwfromm/riptide.git

cd Riptide/tvm && mkdir build && cp cmake/config.cmake build && cd build
cmake ..
make -j4
export TVM_HOME={RiptideLocation}/tvm
export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$PYTHONPATH
这一步成功，succeful....

You should now be able to import Riptide in Python and are ready to train and deploy a binary model!
不，我不行，no I can't , import Riptide erro .

Train bipolar network

In the corresponding paper it mentions about unipolar vs bipolar quantization methods. I wonder how can I train a bipolar network using the train_imagenet.py? Thanks!

bitserial_conv2d/bitpack error with other backends

Hi, I am trying to run one of the notebooks for bitserial conv2d:

import tvm
import numpy as np
import topi
from tvm import relay
import topi.testing
from tvm.contrib import graph_runtime
from topi.util import get_const_tuple

batch = 1
in_height = in_width = in_size = 32
in_dim = 32
out_dim = 32
in_channel = 32
num_filter = 32
kernel = 3
stride = (1, 1)
padding = (1, 1)
activation_bits = 1
weight_bits = 1
unipolar = True

input_dtype = 'uint8'
out_dtype = 'int8'

def generate_quantized_np(shape, bits, out_dtype):
    min_val = 0 
    max_val = 1 << bits
    return np.random.randint(min_val, max_val, size=shape).astype(out_dtype)

with tvm.target.create('llvm'):
    #A = tvm.placeholder((batch, in_channel, in_height, in_width), dtype=input_dtype, name='A')
    #W = tvm.placeholder((num_filter, in_channel, kernel, kernel), dtype=input_dtype, name='W')
    #QW = topi.nn.bitpack(W, weight_bits, pack_axis=1, bit_axis=0, pack_type='uint8')
    
    A = tvm.placeholder((batch, in_height, in_width, in_channel), dtype=input_dtype, name='A')
    #W = tvm.placeholder((num_filter, in_channel, kernel, kernel), dtype=input_dtype, name='W')
    W = tvm.placeholder((kernel, kernel, in_channel, num_filter), dtype=input_dtype, name='W')
    
a_shape = get_const_tuple(A.shape)
w_shape = get_const_tuple(W.shape)

a_np = generate_quantized_np(a_shape, activation_bits, input_dtype)
w_np = generate_quantized_np(w_shape, weight_bits, input_dtype)

if unipolar:
    w_ = np.copy(w_np).astype(out_dtype)
    for x in np.nditer(w_, op_flags=['readwrite']):
        x[...] = 1 if x == 1 else -1
    #b_np = topi.testing.conv2d_nchw_python(a_np.astype(out_dtype), w_, stride, padding)
    b_np = topi.testing.conv2d_nhwc_python(a_np.astype(out_dtype), w_, stride, padding)
else:
    b_np = topi.testing.conv2d_nchw_python(a_np, w_np, stride, padding)
    

input_var = relay.var('input', shape=A.shape, dtype=A.dtype)
kernel_var = relay.var('kernel', shape=W.shape, dtype=W.dtype)
q_kernel = relay.nn.bitpack(kernel_var, bits=1, pack_axis=2, bit_axis=4, pack_type=input_dtype)
q_out = relay.nn.bitserial_conv2d(input_var, q_kernel, channels=32, kernel_size=(3,3), padding=(1, 1), data_layout='NHWC', pack_dtype='uint8', out_dtype='int8', kernel_layout="HWIO")

q_func = relay.Function([input_var, kernel_var], q_out)

target='aocl_sw_emu'
ctx = tvm.context(target, 0)

#target='llvm'
#ctx = tvm.cpu()
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(q_func, target=target, target_host='llvm', params={'kernel': w_np})

module = graph_runtime.create(graph, lib, ctx)
module.set_input('input', a_np)
module.set_input(**params)
module.run()

import pdb; pdb.set_trace()
output = module.get_output(0).asnumpy()
tvm.testing.assert_allclose(output, b_np, rtol=1e-5)

With the x86/llvm backend I have no issues--but when I switch over to the aocl_sw_emu backend I run into an error:

** WARNING: [acls10mx_ref0] NOT using DMA to transfer 32768 bytes from host to device because of lack of alignment
**                 host ptr (0x2e23050) and/or dev offset (0x0) is not aligned to 64 bytes
Traceback (most recent call last):

  File "tut.py", line 73, in <module>
    module.run()

  File "/home/chungs31/repos/Riptide/tvm/python/tvm/contrib/graph_runtime.py", line 176, in run
    self._run()

  File "/home/chungs31/repos/Riptide/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Run()+0x5e) [0x7fea69c51416]
  [bt] (7) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(std::function<void ()>::operator()() const+0x32) [0x7fea69194692]
  [bt] (6) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(+0x22e5bac) [0x7fea69c57bac]
  [bt] (5) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(+0x22e31b9) [0x7fea69c551b9]
  [bt] (4) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x3d) [0x7fea692ebeff]
  [bt] (3) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x6d) [0x7fea692ec473]
  [bt] (2) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(+0x227ecbc) [0x7fea69bf0cbc]
  [bt] (1) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(+0x227d910) [0x7fea69bef910]
  [bt] (0) /home/chungs31/repos/Riptide/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x25) [0x7fea69190907]
  File "/home/chungs31/repos/Riptide/tvm/src/runtime/library_module.cc", line 91
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (1 == int32(arg2.shape[3])), Argument arg2.shape[3] has an unsatisfied constraint

It seems that the shape of the output is incorrectly being asserted? I am wondering if there's anything wrong with the shapes that I used in the script, or if there is some lacking support in the runtime for bitserial operations for this backend. How would I go about tackling this issue?

Operator SpecialBatchNormalization is not supported for frontend Keras

Hi,

I am running riptide dockerfile (python3 + tf2.1) on CPU/x86, getting to a point that model.predict() works, but after that, if I run the following to convert to relay format, it error out Operator SpecialBatchNormalization is not supported for frontend Keras

Thanks!

In [20]: mod, params = relay.frontend.from_keras(
    ...:   model,
    ...:   shape={'input_1': [1, 224, 224, 3]},
    ...:   layout='NHWC')
---------------------------------------------------------------------------
OpNotImplemented                          Traceback (most recent call last)
<ipython-input-20-fbc36025831c> in <module>
      2   model,
      3   shape={'input_1': [1, 224, 224, 3]},
----> 4   layout='NHWC')

~/Riptide/tvm/python/tvm/relay/frontend/keras.py in from_keras(model, shape, layout)
   1115                             op_name = o.name
   1116                 # Add the op to our graph.
-> 1117                 keras_op_to_relay(inexpr, keras_layer, op_name, etab)
   1118     # model._output_coordinates contains out_node(oc[0]), node_index(oc[1]) and tensor_index(oc[2])
   1119     # Get all output nodes in etab using the name made from above values.

~/Riptide/tvm/python/tvm/relay/frontend/keras.py in keras_op_to_relay(inexpr, keras_layer, outname, etab)
   1000     if op_name not in _convert_map:
   1001         raise tvm.error.OpNotImplemented(
-> 1002             'Operator {} is not supported for frontend Keras.'.format(op_name))
   1003     outs = _convert_map[op_name](inexpr, keras_layer, etab)
   1004     outs = _as_list(outs)

OpNotImplemented: Operator SpecialBatchNormalization is not supported for frontend Keras.

Version required

Hi,
interesting work. I've some issues that I cannot actually fix. Can u post the right version of TensorFlow, Keras etc. , because some features are not compatible.

Thanks!

jwfromm / riptide Goto Github PK

riptide's People

Stargazers

Watchers

Forkers

riptide's Issues

Mismatch between Keras.predict and compiled TVM for imported Relay model

Binary Convolution

error:missing 1 required positional argument: 'previous_layer'

erro

Train bipolar network

bitserial_conv2d/bitpack error with other backends

Operator SpecialBatchNormalization is not supported for frontend Keras

Version required

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent