Comments (10)
@laoluani Thanks you for sharing the pref experiment with us.
both Pytorch Mobile and Executorch are both using the XNNPACK backed where possible
I don't even see you call optimize_for_mobile for the PyTorch Mobile model, so it seems just scripts the model. @kimishpatel or @cccclai can clarify how the old torchscript based solution works.
from executorch.
I'm guessing the general optimisations that torch.jit.trace applies to model is out performing the specific optimisations of the XNNPACK preprocess step.
You should actually be seeing speedups with ExecuTorch for MV3 compared to torchscript. Can you share how you lowered to xnnpack as well?
from executorch.
I realised that I hadn't turned on the release flag for my ExecuTorch builds, once enabled I got results much closer to PyTorch Mobile.
I ran a test over 500 iterations with new random inputs on every pass
pytorch mobile avg: 26.05, max: 57, min: 8, std deviation: 11.24
executorch avg: 26.32, max: 75, min: 5, std deviation: 10.22
Here's how I lowered the model:
import torch
import torchvision.models as models
import logging
from torch.export import export, ExportedProgram
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge
from executorch.exir.backend.backend_api import to_backend
mobilenet_v3 = models.mobilenet_v3_small(weights=models.MobileNet_V3_Small_Weights.IMAGENET1K_V1).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )
exported_program: ExportedProgram = export(mobilenet_v3, sample_inputs)
edge: EdgeProgramManager = to_edge(exported_program)
logging.info(f"Exported graph:\n{edge.exported_program()}")
edge = edge.to_backend(XnnpackPartitioner())
logging.info(f"Lowered graph:\n{edge.exported_program()}")
exec_prog = edge.to_executorch()
with open("mv3-xnnpack.pte", "wb") as file:
exec_prog.write_to_file(file)
Here are the build commands:
cmake . -DCMAKE_INSTALL_PREFIX=/build \
-DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK/build/cmake/android.toolchain.cmake" \
-DANDROID_ABI="$ANDROID_ABI" \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_OPTIMIZED=ON \
-DCMAKE_BUILD_TYPE=Release \
-B/build
find "/build" -type f -exec sed -i 's/-fno-exceptions/-fexceptions/g' {} +
cmake --build /build -j16 --target install --config Release
cmake extension/android \
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}"/build/cmake/android.toolchain.cmake \
-DANDROID_ABI="${ANDROID_ABI}" \
-DCMAKE_INSTALL_PREFIX=/build \
-DCMAKE_BUILD_TYPE=Release \
-B/build/extension/android
cmake --build /build/extension/android -j16 --config Release
Are these the expected results and are there any improvements that can be made?
from executorch.
Thanks for the info. I will try to reproduce the numbers and get back to you.
from executorch.
Hi @laoluani have you tried running the same model with cmake-out-android/backends/xnnpack/xnn_executor_runner? Does it produce the same inference time as your integrated app?
from executorch.
When trying to execute the runner on device with the following command data/local/tmp/executor_runner --model_path=data/local/tmp/mv3-xnnpack.pte
, I get the error:
07-09 09:25:38.781 20240 20240 F DEBUG : Build fingerprint: 'samsung/z3quew/z3q:13/TP1A.220624.014/G988U1UES9HXE2:user/release-keys'
07-09 09:25:38.781 20240 20240 F DEBUG : Revision: '15'
07-09 09:25:38.781 20240 20240 F DEBUG : ABI: 'arm64'
07-09 09:25:38.781 20240 20240 F DEBUG : Processor: '0'
07-09 09:25:38.781 20240 20240 F DEBUG : Timestamp: 2024-07-09 09:25:38.768291343+0100
07-09 09:25:38.781 20240 20240 F DEBUG : Process uptime: 1s
07-09 09:25:38.781 20240 20240 F DEBUG : Cmdline: data/local/tmp/executor_runner --model_path=data/local/tmp/mv3-xnnpack.pte
07-09 09:25:38.781 20240 20240 F DEBUG : pid: 20237, tid: 20237, name: executor_runner >>> data/local/tmp/executor_runner <<<
07-09 09:25:38.781 20240 20240 F DEBUG : uid: 2000
07-09 09:25:38.781 20240 20240 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
07-09 09:25:38.781 20240 20240 F DEBUG : x0 0000000000000000 x1 0000000000004f0d x2 0000000000000006 x3 0000007fdcb23fc0
07-09 09:25:38.781 20240 20240 F DEBUG : x4 00000062cfd7cd08 x5 00000062cfd7cd08 x6 00000062cfd7cd08 x7 b4000013c24db19f
07-09 09:25:38.781 20240 20240 F DEBUG : x8 00000000000000f0 x9 00000073fee38bf8 x10 0000000000000001 x11 00000073fee79870
07-09 09:25:38.782 20240 20240 F DEBUG : x12 0000000000093fa0 x13 0000000000000001 x14 00000062cc878778 x15 00000073fee3c8aa
07-09 09:25:38.782 20240 20240 F DEBUG : x16 00000073feee1d70 x17 00000073feebd5b0 x18 00000074004c4000 x19 0000000000004f0d
07-09 09:25:38.782 20240 20240 F DEBUG : x20 0000000000004f0d x21 00000000ffffffff x22 0000000000093fa0 x23 0000000000000000
07-09 09:25:38.782 20240 20240 F DEBUG : x24 b40000719ee24500 x25 b40000719ee244f0 x26 0000007fdcb241d0 x27 0000000000000001
07-09 09:25:38.782 20240 20240 F DEBUG : x28 b40000719ee244f0 x29 0000007fdcb24040
07-09 09:25:38.782 20240 20240 F DEBUG : lr 00000073fee6a7a8 sp 0000007fdcb23fa0 pc 00000073fee6a7d4 pst 0000000000001000
07-09 09:25:38.782 20240 20240 F DEBUG : backtrace:
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: Function names and BuildId information is missing for some frames due
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: to unreadable libraries. For unwinds of apps, only shared libraries
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: found under the lib/ directory are readable.
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: On this device, run setenforce 0 to make the libraries readable.
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: Unreadable libraries:
07-09 09:25:38.782 20240 20240 F DEBUG : NOTE: /data/local/tmp/executor_runner
07-09 09:25:38.782 20240 20240 F DEBUG : #00 pc 00000000000537d4 /apex/com.android.runtime/lib64/bionic/libc.so (abort+168) (BuildId: 870560a8376a70249f9e9a7b480cc02f)
07-09 09:25:38.782 20240 20240 F DEBUG : #01 pc 000000000351d120 /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG : #02 pc 000000000351d0d8 /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG : #03 pc 00000000000a41b8 /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG : #04 pc 000000000004b930 /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+100) (BuildId: 870560a8376a70249f9e9a7b480cc02f)
from executorch.
Hi @laoluani you probably need to use the XNNPACK runner.
from executorch.
Yes that worked for me! What's the best way to get timing info out?
Looks like I'm just getting raw scores.
z3q:/data/local/tmp $ /data/local/tmp/xnn_executor_runner --model_path=./mv3-xnnpack.pte
Output 0: tensor(sizes=[1, 1000], [
-0.0297435, -0.116339, 0.234517, -0.115163, 0.28406, 1.33266, -1.2022, -0.418206, -0.861483, 0.962639,
2.05277, 0.0322932, -0.672343, -0.137665, -0.785477, -0.191068, 0.478215, -1.26891, -0.0889226, -0.0928001,
-1.35861, 3.53779, 1.71552, 1.3614, -0.172415, 0.221101, 1.33974, 0.580504, 1.56505, -0.204694,
-1.91138, 0.0594309, -0.480634, -0.466428, 0.812006, -1.7187, 0.209461, -1.64942, 2.64706, -1.81845,
0.0644182, 0.913044, 1.08294, 2.03153, 0.366071, 1.04362, -0.548808, 2.08426, -1.41956, -1.71163,
-1.92737, 0.822696, 1.65539, 0.988034, 0.465189, -1.36933, -0.598565, -0.559072, -0.845721, 1.56357,
0.656557, -0.246826, 0.556268, 1.71325, 0.38631, 0.838485, 1.38649, -0.594456, 1.44696, -0.336695,
0.817791, 2.16338, 0.452428, 2.40616, -0.203028, 1.58678, -0.00467801, 0.391145, 3.28668, 1.32205,
2.26962, 0.525644, 0.393954, 2.02201, -1.76368, -0.793006, 0.871535, 0.615611, -0.240767, 0.662297,
-0.822543, 0.770243, 1.65924, -0.345425, -0.102899, 0.538089, 0.0803897, -1.50696, 0.749941, 0.125525,
...,
0.978575, 2.59865, 3.61896, 0.340622, 0.44494, 0.760727, 2.25554, 0.673822, 0.773279, 0.901685,
1.19766, 0.744912, 1.02076, 0.317082, 1.93721, -0.230186, 2.05909, 1.14174, -0.71805, 2.01184,
1.97479, 2.68669, 0.0470679, 0.85921, -1.01457, 1.66463, 1.62474, 0.308362, -0.830093, 0.904799,
2.00759, -0.694682, -0.0069273, -0.317874, 0.0855309, 0.020355, -0.0538518, -0.501041, 0.918039, -1.4842,
1.68228, -0.304506, 0.818993, -0.442584, -1.71591, 0.721102, -1.11913, -0.560189, 0.792763, 1.72536,
0.0979954, 0.635417, 0.140096, -0.858204, 0.531698, -1.45714, 0.525842, -0.311962, -0.148772, 0.240123,
0.233015, 0.418201, 0.319395, -1.34773, 0.813186, -0.508101, 1.48234, 0.756897, 0.994117, 2.93617,
0.502231, 0.777379, 0.751076, -1.91975, 1.34087, 0.581285, 1.6911, 1.16569, 2.25252, 0.19116,
0.948301, 0.508456, 0.229856, -0.155784, 0.170956, 0.545924, -0.805998, 0.195962, -0.0608273, -0.785059,
-0.438103, -1.82035, -0.877346, -0.129996, -0.643686, 0.254722, -0.689362, -0.311216, 0.469063, 1.01164,
])```
from executorch.
Does adb log have timestamp?
from executorch.
So using the linux time
command I got the following: 0m00.15s real 0m00.45s user 0m00.04s system
. So looks like the total time was about 15ms. Is the executor runner doing a single forward or multiple?
If this is a single pass, this falls within the range of values I got whilst testing. What's the difference between the executor runner and the actual ExecuTorch runtime?
Correction
This is actually 150ms, so it doesn't fall in the same range as my previous tests. I did a test over 100 iterations and got an average of 130ms.
from executorch.
Related Issues (20)
- llama model is not fully lowered to ANE (coreml backend)
- Fix bos/eos tokens in bpe tokenizer in c++ HOT 1
- examples/models/llama2 - undefined reference to `pthread_once' HOT 4
- Main requires numpy 1.25 but coremltools only supports up to 1.23 HOT 1
- Failed with qualcomm.scripts.export_example HOT 8
- Crash (native stack) for llama2 demo on Android HOT 4
- Unable to Convert DeepFilterNet to ExecuTorch HOT 10
- Failures/Errors in unit test (with QualComm's AI SDK) HOT 4
- Is there example for running model with Quantization with Vulkan GPU ? HOT 2
- LLAMA Runner with QNN Build Failure in Recent Push HOT 5
- [Mutable Buffer] [Core ML Delegate] Let Core ML Handle Mutable Buffer HOT 3
- Error [[]] attributes are a C23 extension when building with Qualcomm AI Engine Direct HOT 11
- Segmentation Fault when implementing llama/stories110M Android phone deployment HOT 4
- Cross-compiling Executorch SDK for Android HOT 3
- How to export a pretrained model? HOT 1
- The apk downloaded from S3 can not be unzipped HOT 1
- [Android] Set mobile-job workflow as a periodic job HOT 1
- UNSTABLE Build Linux Wheels / pytorch/executorch / upload / manywheel-py3_9-cpu HOT 1
- UNSTABLE Build M1 Wheels / pytorch/executorch / upload / wheel-py3_9-cpu HOT 1
- [coreml][sdk] Add intermediate tensor logging for the coreml
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from executorch.