Giter Club home page Giter Club logo

huawei-noah / bolt Goto Github PK

View Code? Open in Web Editor NEW
895.0 49.0 157.0 115.98 MB

Bolt is a deep learning library with high performance and heterogeneous flexibility.

Home Page: https://huawei-noah.github.io/bolt/

License: MIT License

CMake 1.21% C 9.52% C++ 80.65% Shell 0.43% Python 5.76% Java 1.20% Objective-C 0.40% Objective-C++ 0.33% Makefile 0.01% XSLT 0.01% Jupyter Notebook 0.50%
bolt inference high-performance x86 mali deep-learning cv nlp arm onnx

bolt's Introduction

Introduction


License: MIT

Bolt is a light-weight library for deep learning. Bolt, as a universal deployment tool for all kinds of neural networks, aims to automate the deployment pipeline and achieve extreme acceleration. Bolt has been widely deployed and used in many departments of HUAWEI company, such as 2012 Laboratory, CBG and HUAWEI Product Lines. If you have questions or suggestions, you can submit issue. QQ群: 833345709

Why Bolt is what you need?


  • High Performance: 15%+ faster than existing open source acceleration libraries.
  • Rich Model Conversion: support Caffe, ONNX, TFLite, Tensorflow.
  • Various Inference Precision: support FP32, FP16, INT8, 1-BIT.
  • Multiple platforms: ARM CPU(v7, v8, v8.2+, v9), X86 CPU(AVX2, AVX512), GPU(Mali, Qualcomm, Intel, AMD)
  • Bolt is the first to support NLP and also supports common CV applications.
  • Minimize ROM/RAM
  • Rich Graph Optimization
  • Efficient Thread Affinity Setting
  • Auto Algorithm Tuning
  • Time-Series Data Acceleration

See more excellent features and details here

Building Status


There are some common used platform for inference. More targets can be seen from scripts/target.sh. Please make a suitable choice depending on your environment. If you want to build on-device training module, you can add --train option. If you want to use multi-threads parallel, you can add --openmp option. If you want to build for cortex-M or cortex-A7 with restricted ROM/RAM(Sensor, MCU), you can see docs/LITE.md.

Bolt defaultly link static library, This may cause some problem on some platforms. You can use --shared option to link shared library.

target platform precision build command Linux Windows MacOS
Android(armv7) fp32,int8 ./install.sh --target=android-armv7 Build Status Build Status Build Status
Android(armv8) fp32,int8 ./install.sh --target=android-aarch64 --fp16=off Build Status Build Status Build Status
Android(armv8.2+) fp32,fp16,int8,bnn ./install.sh --target=android-aarch64 Build Status Build Status Build Status
Android(armv9) fp32,fp16,bf16,int8,bnn ./install.sh --target=android-aarch64_v9 Build Status Build Status Build Status
Android(gpu) fp16 ./install.sh --target=android-aarch64 --gpu Build Status Build Status Build Status
Android(x86_64) fp32,int8 ./install.sh --target=android-x86_64 Build Status Build Status Build Status
iOS(armv7) fp32,int8 ./install.sh --target=ios-armv7 / / Build Status
iOS(armv8) fp32,int8 ./install.sh --target=ios-aarch64 --fp16=off / / Build Status
iOS(armv8.2+) fp32,fp16,int8,bnn ./install.sh --target=ios-aarch64 / / Build Status
Linux(armv7) fp32,int8 ./install.sh --target=linux-armv7_blank Build Status / /
Linux(armv8) fp32,int8 ./install.sh --target=linux-aarch64_blank --fp16=off Build Status / /
Linux(armv8.2+) fp32,fp16,int8,bnn ./install.sh --target=linux-aarch64_blank Build Status / /
Linux(x86_64) fp32,int8 ./install.sh --target=linux-x86_64 Build Status / /
Linux(x86_64_avx2) fp32 ./install.sh --target=linux-x86_64_avx2 Build Status / /
Linux(x86_64_avx512) fp32,int8 ./install.sh --target=linux-x86_64_avx512 Build Status / /
Windows(x86_64) fp32,int8 ./install.sh --target=windows-x86_64 / Build Status /
Windows(x86_64_avx2) fp32 ./install.sh --target=windows-x86_64_avx2 / Build Status /
Windows(gpu) fp16 ./install.sh --target=windows-x86_64_avx2 --gpu --fp16=on / Build Status /
Windows(x86_64_avx512) fp32,int8 ./install.sh --target=windows-x86_64_avx512 / Build Status /
Windows(armv8.2+) fp32,fp16,int8,bnn ./install.sh --target=windows-aarch64 / / Build Status
MacOS(x86_64) fp32,int8 ./install.sh --target=macos-x86_64 / / Build Status
MacOS(x86_64_avx2) fp32 ./install.sh --target=macos-x86_64_avx2 / / Build Status
MacOS(x86_64_avx512) fp32,int8 ./install.sh --target=macos-x86_64_avx512 / / Build Status
MacOS(armv8.2+) fp32,fp16,int8,bnn ./install.sh --target=macos-aarch64 / / Build Status

Quick Start


Two steps to get started with bolt.
  1. Conversion: use X2bolt to convert your model from caffe, onnx, tflite or tensorflow to .bolt file;

  2. Inference: run benchmark with .bolt and data to get the inference result.

    For more details about the usage of X2bolt and benchmark tools, see docs/USER_HANDBOOK.md.

DL Applications in Bolt

Here we show some interesting and useful applications in bolt.

Image Classification
android ios
Face Detection
ios exe
Pose Detection
android
Semantics Analysis
android
Reading Comprehension
android
Chinese Speech Recognition
android ios

Verified Networks


Bolt has shown its high performance in the inference of common CV, NLP and Recommendation neural networks. Some of the representative networks that we have verified are listed below. You can find detailed benchmark information in docs/BENCHMARK.md.

Application Models
CV Resnet50, Shufflenet, Squeezenet, Densenet, Efficientnet, Mobilenet_v1, Mobilenet_v2, Mobilenet_v3, BiRealNet, ReActNet, Ghostnet, unet, LCNet, Pointnet, hair-segmentation, duc, fcn, retinanet, SSD, Faster-RCNN, Mask-RCNN, Yolov2, Yolov3, Yolov4, Yolov5, ViT, TNT, RepVGG, VitAE, CMT, EfficientFormer ...
NLP Bert, Albert, Tinybert, Neural Machine Translation, Text To Speech(Tactron,Tactron2,FastSpeech+hifigan,melgan), Automatic Speech Recognition, DFSMN, Conformer, Tdnn, FRILL, T5, GPT-2, Roberta, Wenet ...
Recommendation NFM, AFM, ONN, wide&deep, DeepFM, MMOE
More DL Tasks ...

More models than these mentioned above are supported, users are encouraged to further explore.

On-Device Training


On-Device Training has come, it's a beta vesion which supports Lenet, Mobilenet_v1 and Resnet18 for training on the embedded devices and servers. Want more details of on-device training in bolt? Get with the official training tutorial.

Documentations


Everything you want to know about bolt is recorded in the detailed documentations stored in docs.

Articles


教程


Acknowledgement


Bolt refers to the following projects: caffe, onnx, tensorflow, ncnn, mnn, dabnn.

License


The MIT License(MIT)

bolt's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bolt's Issues

No third_party install shell script

The install shell script to install third party libraries mentioned in INSTALL.md is not in repo. So I have to install and configure all the dependencies manually

是否能完整支持二值化卷积网络?

Bolt supports both XNOR-style and DoReFa-style BNN networks. Just save the binary weights as FP32 in an Onnx model, and X2bolt will automatically convert the storage to 1-bit representations. So far, the floating-point portion of the BNN network can only be FP16 operations, so pass "FP16" as the precision parameter to X2bolt. The number of output channels for BNN convolution layers should be divisible by 32.
这里提到的FP16是什么意思?是指对二值化网络的支持实际是用FP16实现的吗?为什么最后输出的通道数必须要被32整除呢?

Error while cloning tensorflow

Building by command

./install.sh -t 12 -c llvm

getting this error:

  • [new tag] v2.1.0-rc1 -> v2.1.0-rc1
  • [new tag] v2.1.0-rc2 -> v2.1.0-rc2
  • [new tag] v2.2.0-rc0 -> v2.2.0-rc0
  • [new tag] v2.2.0-rc1 -> v2.2.0-rc1
    From https://github.com/tensorflow/tensorflow
  • branch master -> FETCH_HEAD
    error: Sparse checkout leaves no entry on working directory

llvm-ranlib问题

$ ./install.sh --target=android-aarch64 --gpu
[ERROR] please install llvm-ranlib tools and set shell environment PATH to find it

新版本中编译过程用到的llvm-ranlib在ndkr20版本中,不存在,应当为llvm-ar,可以复制粘贴重命名一下,建议修改一下install.sh脚本

模型功耗分析

是否有FP16, INT8, Binary相同结构下的功耗对比分析数据?

支持TorchScript格式的模型做为输入吗 ?

REAMDE的介绍上, 从REAMDE文档以及代码目录上来开, PyTorch模型有2种转换方式:

  1. PyTorch->ONNX
  2. PyTorch->Caffe
    这2中方式都有比较大的局限.比如PyTorch->ONNX方式, PyTorch自身的一些Prim算子(IF, LOOP, LISTUNPACK ... )ONNX支持不好或者不支持. 所以目前的Runtime推理框架是否考虑支持PyTorch Script做为模型的输入格式?

https://pytorch.org/docs/stable/jit.html?highlight=script
https://github.com/pytorch/pytorch/tree/master/torch/csrc/jit

bolt build failed on windows, here's the error report

CMake Error at C:/Program Files/CMake/share/cmake-3.22/Modules/CMakeTestCCompiler.cmake:69 (message):
The C compiler

"D:/mingw64/bin/gcc.exe"

is not able to compile a simple test program.

It fails with the following output:

Change Dir: C:/Users/Q/Desktop/bolt-master/third_party/windows-x86_64/protobuf/protobuf-3.14.0/build/CMakeFiles/CMakeTmp

Run Build Command(s):D:/mingw64/bin/mingw32-make.exe -f Makefile cmTC_4fc87/fast && mingw32-make  -f CMakeFiles\cmTC_4fc87.dir\build.make CMakeFiles/cmTC_4fc87.dir/build
mingw32-make[1]: Entering directory 'C:/Users/Q/Desktop/bolt-master/third_party/windows-x86_64/protobuf/protobuf-3.14.0/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_4fc87.dir/testCCompiler.c.obj
D:\mingw64\bin\gcc.exe    -o CMakeFiles\cmTC_4fc87.dir\testCCompiler.c.obj -c C:\Users\Q\Desktop\bolt-master\third_party\windows-x86_64\protobuf\protobuf-3.14.0\build\CMakeFiles\CMakeTmp\testCCompiler.c
Linking C executable cmTC_4fc87.exe
"C:\Program Files\CMake\bin\cmake.exe" -E cmake_link_script CMakeFiles\cmTC_4fc87.dir\link.txt --verbose=1
"C:\Program Files\CMake\bin\cmake.exe" -E rm -f CMakeFiles\cmTC_4fc87.dir/objects.a
D:\mingw64\bin\ar.exe qc CMakeFiles\cmTC_4fc87.dir/objects.a @CMakeFiles\cmTC_4fc87.dir\objects1.rsp
D:\mingw64\bin\gcc.exe -Wl,--whole-archive CMakeFiles\cmTC_4fc87.dir/objects.a -Wl,--no-whole-archive -o cmTC_4fc87.exe -Wl,--out-implib,libcmTC_4fc87.dll.a -Wl,--major-image-version,0,--minor-image-version,0 @CMakeFiles\cmTC_4fc87.dir\linklibs.rsp
gcc.exe: error: CreateProcess: No such file or directory
mingw32-make[1]: *** [CMakeFiles\cmTC_4fc87.dir\build.make:100: cmTC_4fc87.exe] Error 1
mingw32-make[1]: Leaving directory 'C:/Users/Q/Desktop/bolt-master/third_party/windows-x86_64/protobuf/protobuf-3.14.0/build/CMakeFiles/CMakeTmp'
mingw32-make.exe: *** [Makefile:126: cmTC_4fc87/fast] Error 2

使用1.2.1的X2bolt转换tflite失败(1.2.0没有问题)

1.2.1替换了schema为tensorflow master版本,解析opcode出了问题。在tflite_adaptee的260行加了些日志,打印的opcode全是0:

[INFO] thread 12984 Start to convert ./xxx.tflite...
[parse_file] tfliteModel->operator_codes[0]->builtin_code : 0
[parse_file] tfliteModel->operator_codes[1]->builtin_code : 0
[parse_file] tfliteModel->operator_codes[2]->builtin_code : 0
[parse_file] tfliteModel->operator_codes[3]->builtin_code : 0
[parse_file] tfliteModel->operator_codes[4]->builtin_code : 0
[parse_file] tfliteModel->operator_codes[5]->builtin_code : 0
Segmentation fault

What features do you want to add to bolt?

We want to know what you want to add to bolt, we will evaluate your advice and make a plan to develop, so please tell us if your requirement.

  • add an app to demo bolt's ability
  • add supports for mobile GPU
  • add supports for fp32
  • add a benchmark tool
  • add supports for DaVinci

benchmark issue

对bolt进行了benchmark测试,install 阶段也关闭了 profile功能,只看模型总耗时,发现达不到文章里提到的性能,不知道是我哪里用错了,请帮忙看一下
如图所示
WXWorkCapture_16431661593291
文章里提到对squeezent1.1在高通888 half情况下耗时为3.949ms,我在小米11 高通888实测fp16case耗时为avg_time:7.443091ms/data;
为了验证,我实际测试了一下 https://github.com/huawei-noah/bolt/blob/master/docs/USER_HANDBOOK.md中提到的 resnet50这个网络,利用X2BOLT工具,我的命令如下./benchmark -a GPU -w 10 -l 10 -m ResNet-50_f16.bolt
WXWorkCapture_16431838464497
高通888fp16耗时情况为
Benchmark Result:
Output Tensor prob desc: dt:DT_F16 memFormat:DF_NCHW stride(1000,1,1) offset(0,0,0) data: 0.000166 0.000330 0.000063 0.000110 0.000000 0.000508 0.000000 0.000000 sum: 0.992770
total_time:305.839355ms(loops=10)
avg_time:30.583936ms/data
min_time:29.903076ms/data
max_time:31.020020ms/data
请问一下,这里的平均耗时30.58ms性能是否正常,能否share一下 resnet50的性能耗时情况,或者提供一下resnet50_v2的模型文件(官方文章为25ms左右),交叉验证一下。

Raspberry Pi 4 compile error

Is there any document for Raspberrypi 4 ?

I have below error for cmake ..


error: ‘Factory’ was not declared in this scope
     std::shared_ptr<Factory> factory;

bolt的未来

bolt在华为内部的定位是咋样的呢,未来对hi芯片有没有可能支持?

Cmake error (Protobuf_SHARED_LIBRARY NOT FOUND) occurs

Environment

  • Ubuntu 18.04
  • cmake version 3.10.2
  • Android(armv8+mali)
  • android-ndk-r20b

Command

./install.sh --target=android-aarch64 --mali -t 6

Log

-- CXXFLAGS: --target=aarch64-linux-android21 -W -Wall -Wextra -O3 -fPIC -fstack-protector-all -Wno-unused-command-line-argument -Wno-unused-parameter -Wno-unused-result -Wno-deprecated-declarations -Wno-unused-variable -pthread -D_USE_JNI -D_USE_ANDROID_LOG -llog -D_USE_GENERAL -D_USE_MALI -D_USE_FP32 -D_USE_NEON -D_USE_FP16 -D_USE_F16_MIX_PRECISION -D_USE_INT8 -march=armv8-a+fp16+dotprod -D_USE_CAFFE -D_USE_ONNX -D_USE_TFLITE -D_USE_TENSORFLOW -std=c++11 -W -Wall -Wextra -O3 -fPIC -fstack-protector-all -Wno-unused-command-line-argument -Wno-unused-parameter -Wno-unused-result -Wno-deprecated-declarations -Wno-unused-variable -pthread -D_USE_JNI -D_USE_ANDROID_LOG -llog -D_USE_GENERAL -D_USE_MALI -D_USE_FP32 -D_USE_NEON -D_USE_FP16 -D_USE_F16_MIX_PRECISION -D_USE_INT8 -march=armv8-a+fp16+dotprod -D_USE_CAFFE -D_USE_ONNX -D_USE_TFLITE -D_USE_TENSORFLOW -Wl,-allow-shlib-undefined -static-libstdc++

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Protobuf_SHARED_LIBRARY
    linked by target "model_tools_caffe" in directory /home/ubuntu/workspace/bolt/model_tools/src/caffe
    linked by target "model_tools_onnx" in directory /home/ubuntu/workspace/bolt/model_tools/src/onnx

-- Configuring incomplete, errors occurred!

opencl的一个bug

我在测试bolt的opencl时发现一个bug;
由于bolt采用了NCHW / NCHWC4等数据排布混用、针对opencl在层与层之间的blob混用了buffer、image1d、image2d、image3d,同时内存分配上还采用了内存复用,可能导致了我的一个模型在depth2space_ocl层触发了一个bug,就是depth2space对应的kernel的arg里是写的是buffer的输入类型,但是内存复用以后,对应arg传入了一个image3d的数据类型,导致set_arg报错CL_INVALID_MEM_OBJ,我还在定位是内存复用的代码

tensorflow to caffe model, use caffe to infer, caffe uses that version ???

[libprotobuf ERROR google/protobuf/text_format.cc:307] Error parsing text-format caffe.NetParameter: 80:14: Message type "caffe.EmbedParameter" has no field named "transpose".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0225 15:25:28.170408 100496 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: tensorflow2caffe/tts/tts_encoder.prototxt
*** Check failure stack trace: ***

How to build on Raspberry?

We have sucessully build bolt inference library without model converter on Raspberry 3 model B(armv7).

#67

export CFLAGS="-march=armv7-a -mfpu=neon-vfpv4 "
export CXXFLAGS="-march=armv7-a -mfpu=neon-vfpv4 "
./install.sh --target=linux-armv7_blank --converter=off -t 4

#benchmark
./install_linux-armv7_blank/example/benchmark -m ./kit/assets/ImageClassification/ghostnet_f32.bolt

You can transfer your bolt model to Raspberry and run inference.

Demo for quick start

In many excellent open source projects, a demo will be given as a quick start, however, we can't find a real example that can be run directly after compilation ?

An example with input data and real models may be more friendly to some beginners. Thx :)

STORE_OUTPUT存在typo,Adreno GPU计算反卷积f2s2出错

./test_deconvolution_ocl 24 256 128 1 2 2 2 0                                                        <

[DEBUG] thread 13883 OCLContext 0x61531c6278 constructor start
[DEBUG] thread 13883 try to dlopen libQUALCOMM_Adreno_660_map.so failed, dlopen failed: library "libQUALCOMM_Adreno_660_map.so" not found, create kernel from source code
[DEBUG] thread 13883 gcl_kernel_source 0xb40000714c3a1250 constructor
[DEBUG] thread 13883 OCLContext 0x61531c6278 constructor end
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_12 runInfo: ls <0 0 0> executeTime = 153.856000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_22 runInfo: ls <0 0 0> executeTime = 130.816000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_32 runInfo: ls <0 0 0> executeTime = 153.088000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_42 runInfo: ls <0 0 0> executeTime = 122.880000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_14 runInfo: ls <0 0 0> executeTime = 143.872000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 102.144000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_34 runInfo: ls <0 0 0> executeTime = 118.016000 us
[DEBUG] thread 13883 enqueue_fill_image runInfo: executeTime = 15.872000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_trans_fltbuf_44 runInfo: executeTime = 5.888000 us
[DEBUG] thread 13883 DATATRANS>>> enqueue_write_buffer runInfo: executeTime = 129.024000 us
[DEBUG] thread 13883 KERNEL>>> unknow_mem_trans_om_nchw_to_nchwc4 runInfo: executeTime = 113.920000 us
[INFO] thread 13883 warm up gpu:
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 102.912000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 100.864000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 98.048000 us
[DEBUG] thread 13883 KERNEL>>> unknow_mem_trans_im_nchwc4_to_nchw runInfo: executeTime = 51.968000 us
[DEBUG] thread 13883 DATATRANS>>> enqueue_read_buffer runInfo: executeTime = 16.896000 us
[INFO] thread 13883 16bit,         Deonvolution,                                    (1 24 256 128)+(24 1 2 2)/(2 0)=(1 1 512 256),    TIME    0.098ms,        GFLOPS   65.504
abs(diff) >= 1.000000e+00f, number = 23
abs(diff) >= 1.000000e-01f, number = 822
abs(diff) >= 1.000000e-02f, number = 164
abs(diff) >= 1.000000e-03f, number = 1084
abs(diff) >= 1.000000e-04f, number = 85300
abs(diff) >= 1.000000e-05f, number = 3176
abs(diff) >= 0.000000e+00f, number = 40503
maxabs = 1.530273, a = 0.000000, b = 1.530273 @ 428
maxrel = 976.562500, a = -0.000244, b = 0.000244 @ 73386
[DEBUG] thread 13883 OCLContext 0x61531c6278 deconstructor start
[DEBUG] thread 13883 gcl_kernel_source 0xb40000714c3a1250 constructor
[DEBUG] thread 13883 OCLContext 0x61531c6278 deconstructor end

使用GPU算法选择文件加速模型初始化,存在corner case未被加速

GPU的算法文件包含algorithmMap和kernelThreadMap,当模型仅包含一些简单OP(eltwise, power等)时,不需要对tiling等参数做搜索,这时algorithmMap就是空的,kernelThreadMap中仍然包含着这些OP的local搜索结果。

因此存在一种corner case:algorithmMap.size() == 0 && kernelThreadMap.size() > 0

这时void saveMapToFile() 就会出现bug,导致这种模型的local搜索结果不会被保存到算法文件中。从而,模型下次初始化时虽然链接了这个算法文件,仍然需要重新搜索local。这时模型的第一次执行就会非常慢。具体表现是-w 0和-w 1的执行时间差异非常明显。

batch inference

Is bolt support batch inference , could I inference 2 or more sentence at the same time ?

error adding symbols: file in wrong format

Hello, compiling with your instructions (using cross compile) I face this problem. How can I solve it?

[ 74%] Linking CXX executable ../../../image/bin/test_image_processing
/home/<user>/Downloads/gcc-arm-8.3-2019.03-x86_64-aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/8.3.0/../../../../aarch64-linux-gnu/bin/ld: ../../../image/dependency/png/lib/libpng.a(png.o): Relocations in generic ELF (EM: 62)
/home/<user>/Downloads/gcc-arm-8.3-2019.03-x86_64-aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/8.3.0/../../../../aarch64-linux-gnu/bin/ld: ../../../image/dependency/png/lib/libpng.a(png.o): Relocations in generic ELF (EM: 62)
/home/<user>/Downloads/gcc-arm-8.3-2019.03-x86_64-aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/8.3.0/../../../../aarch64-linux-gnu/bin/ld: ../../../image/dependency/png/lib/libpng.a(png.o): Relocations in generic ELF (EM: 62)
/home/<user>/Downloads/gcc-arm-8.3-2019.03-x86_64-aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/8.3.0/../../../../aarch64-linux-gnu/bin/ld: ../../../image/dependency/png/lib/libpng.a: error adding symbols: file in wrong format

Mali GPU errors in install.sh script

Hello,
how to fix this

CANNOT LINK EXECUTABLE

errors? Not running on Kirin 980 nor 990.

1: --- Network Test (LeNet)
1: CANNOT LINK EXECUTABLE "/data/local/tmp/uldra/lenet": cannot locate symbol "Mali_G76p_bin" referenced by "/data/local/tmp/uldra/libkernelbin.so"...
1: CANNOT LINK EXECUTABLE "/data/local/tmp/uldra/lenet": cannot locate symbol "Mali_G76p_bin" referenced by "/data/local/tmp/uldra/libkernelbin.so"...
1: [ 20%] /data/local/tmp/uldra/hdr_ocl
1: [ 40%] /data/local/tmp/uldra/hdr_ocl
1: [ 60%] /data/local/tmp/uldra/hdr_ocl
1: [ 80%] /data/local/tmp/uldra/hdr_ocl
1: [100%] /data/local/tmp/uldra/hdr_ocl
1: /home/yury/source/bolt-master/tests/bin/hdr_ocl: 1 file pushed. 5.3 MB/s (324480 bytes in 0.059s)
1:
1:
1: --- GPU Network Test (HDR_OCL)
1:
1: === Input FP16
1: CANNOT LINK EXECUTABLE "/data/local/tmp/uldra/hdr_ocl": cannot locate symbol "Mali_G76p_bin" referenced by "/data/local/tmp/uldra/libkernelbin.so"...
1:
1: === Input UCHAR
1: CANNOT LINK EXECUTABLE "/data/local/tmp/uldra/hdr_ocl": cannot locate symbol "Mali_G76p_bin" referenced by "/data/local/tmp/uldra/libkernelbin.so"...
1/1 Test #1: quick_benchmark .................. Passed 7.88 sec

TFLite not install success

MINGW64 /f/bolt-master
$ ./install.sh --target=android-aarch64
[INFO] use 8 threads to parallel build third party library on windows-x86_64 for target android-aarch64 in directory /f/bolt-master/third_party/android-aarch64...
[INFO] use c language compiler /c/Users/AppData/Local/Android/Sdk/ndk/20.1.5948944/toolchains/llvm/prebuilt/windows-x86_64/bin/clang
[INFO] use c++ language compiler /c/Users/AppData/Local/Android/Sdk/ndk/20.1.5948944/toolchains/llvm/prebuilt/windows-x86_64/bin/clang++
[INFO] generate environment file to /f/bolt-master/third_party/android-aarch64.sh...
[INFO] build TFLite in /f/bolt-master/third_party/android-aarch64/tflite...
[INFO] please source /f/bolt-master/third_party/android-aarch64.sh to use...
[INFO] use /f/bolt-master/third_party/android-aarch64.sh to set environment variable...
[ERROR] TFLite not install success

CANNOT LINK EXECUTABLE

Hi, everyone

Could you help me to resolve an issue please

I've built bolt as it described in INSTALL.md with Kirin 980 device plugged in.
At the end of installation I've seen:

1: Test command: /root/bolt/quick_benchmark.sh "-b" "/root/bolt/tests/bin" "-p" "/data/local/tmp/uldra" "-l" "/root/bolt/install_llvm/lib"
1: Test timeout computed to be: 10000000
1: [INFO] run test in '/root/bolt/tests/bin'
1: [INFO] test on device directory `/data/local/tmp/uldra'
1: [INFO] use library in /root/bolt/install_llvm/lib
1: /root/bolt/install_llvm/lib/libBoltModel.so: 1 file pushed. 2.5 MB/s (1067120 bytes in 0.413s)
1: /root/bolt/install_llvm/lib/libblas-enhance.so: 1 file pushed. 1.8 MB/s (57456 bytes in 0.031s)
1: /root/bolt/install_llvm/lib/libimage.so: 1 file pushed. 2.5 MB/s (149856 bytes in 0.058s)
1: /root/bolt/install_llvm/lib/libinference.so: 1 file pushed. 2.6 MB/s (682352 bytes in 0.253s)
1: /root/bolt/install_llvm/lib/libmodel-tools.so: 1 file pushed. 2.5 MB/s (246248 bytes in 0.093s)
1: /root/bolt/install_llvm/lib/libmodel-tools_caffe.so: 1 file pushed. 1.9 MB/s (1439040 bytes in 0.710s)
1: /root/bolt/install_llvm/lib/libmodel-tools_onnx.so: 1 file pushed. 2.7 MB/s (486920 bytes in 0.169s)
1: /root/bolt/install_llvm/lib/libmodel-tools_tflite.so: 1 file pushed. 3.2 MB/s (279320 bytes in 0.083s)
1: /root/bolt/install_llvm/lib/libtensor_computing.so: 1 file pushed. 2.3 MB/s (709328 bytes in 0.291s)
1: /root/bolt/tests/bin/test_mmm_int8: 1 file pushed. 2.7 MB/s (131904 bytes in 0.047s)
1: /root/bolt/tests/bin/test_mmm: 1 file pushed. 2.0 MB/s (136400 bytes in 0.066s)
1:  
1: --- Matrix Matrix Multiplication
1: taskset: failed to set 25058's affinity: Invalid argument
1: taskset: failed to set 25061's affinity: Invalid argument
1: /root/bolt/tests/bin/test_convolution: 1 file pushed. 1.5 MB/s (30144 bytes in 0.019s)
1:  
1: --- Conv IC=3
1: taskset: failed to set 25065's affinity: Invalid argument
1: /root/bolt/tests/bin/test_convolution_bnn: 1 file pushed. 1.5 MB/s (30152 bytes in 0.019s)
1: /root/bolt/tests/bin/test_convolution_int8: 1 file pushed. 1.7 MB/s (30232 bytes in 0.017s)
1:  
1: --- Conv 5x5
1: taskset: failed to set 25070's affinity: Invalid argument
1: taskset: failed to set 25073's affinity: Invalid argument
1: taskset: failed to set 25076's affinity: Invalid argument
1:  
1: --- Conv 3x3
1: taskset: failed to set 25079's affinity: Invalid argument
1: taskset: failed to set 25082's affinity: Invalid argument
1: taskset: failed to set 25085's affinity: Invalid argument
1: /root/bolt/tests/bin/test_depthwise_convolution: 1 file pushed. 1.4 MB/s (30264 bytes in 0.021s)
1:  
1: --- Depthwise-Pointwise Conv
1: taskset: failed to set 25089's affinity: Invalid argument
1: /root/bolt/tests/bin/lenet: 1 file pushed. 2.1 MB/s (414384 bytes in 0.185s)
1:  
1:  
1: --- Network Test (LeNet)
1: taskset: failed to set 25093's affinity: Invalid argument
1: taskset: failed to set 25096's affinity: Invalid argument
1/1 Test #1: quick_benchmark ..................   Passed    5.42 sec

100% tests passed, 0 tests failed out of 1

But when I try to run onnx2bolt binary I see an error:

CANNOT LINK EXECUTABLE "./tools/onnx2bolt": library "libprotobuf.so.11" not found

There was some other error before I exported export LD_LIBRARY_PATH=/data/local/tmp/uldra

Add a benchmark tool

Please add an easy-to-use benchmark tool to run arbitrary models for users to see performance on popular model like MobileNet and etc.

ONNX2Bolt what's available

Hello,

Could you please explain couple of things that is unclear for me. I use onnx models, and would like to use onnx2bolt runtime. I've deployed MaskRCNN network in ONNX, and script fails.

  1. How to know more information from unsuccesfull run. More than Segfault
  2. Are ROIAlign and NMS available to be converted from ONNX? Which ONNX opset is supported?
  3. And the last. What is parameter skip operators in ONNX2Bolt runtime?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.