I'm currently accessing the performance of Exectorch compared to PyTorch Mobile. When

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Slower inference time running MobileNet V3 when compared to PyTorch Mobile about executorch HOT 10 OPEN

laoluani commented on July 22, 2024

Slower inference time running MobileNet V3 when compared to PyTorch Mobile

from executorch.

Comments (10)

guangy10 commented on July 22, 2024

@laoluani Thanks you for sharing the pref experiment with us.

both Pytorch Mobile and Executorch are both using the XNNPACK backed where possible

I don't even see you call optimize_for_mobile for the PyTorch Mobile model, so it seems just scripts the model. @kimishpatel or @cccclai can clarify how the old torchscript based solution works.

from executorch.

cbilgin commented on July 22, 2024

I'm guessing the general optimisations that torch.jit.trace applies to model is out performing the specific optimisations of the XNNPACK preprocess step.

You should actually be seeing speedups with ExecuTorch for MV3 compared to torchscript. Can you share how you lowered to xnnpack as well?

from executorch.

laoluani commented on July 22, 2024

I realised that I hadn't turned on the release flag for my ExecuTorch builds, once enabled I got results much closer to PyTorch Mobile.

I ran a test over 500 iterations with new random inputs on every pass

pytorch mobile avg: 26.05, max: 57, min: 8, std deviation: 11.24
executorch avg: 26.32, max: 75, min: 5, std deviation: 10.22

Here's how I lowered the model:

import torch
import torchvision.models as models
import logging

from torch.export import export, ExportedProgram
from torchvision.models.mobilenetv2 import MobileNet_V2_Weights
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
from executorch.exir import EdgeProgramManager, ExecutorchProgramManager, to_edge
from executorch.exir.backend.backend_api import to_backend

mobilenet_v3 = models.mobilenet_v3_small(weights=models.MobileNet_V3_Small_Weights.IMAGENET1K_V1).eval()
sample_inputs = (torch.randn(1, 3, 224, 224), )

exported_program: ExportedProgram = export(mobilenet_v3, sample_inputs)
edge: EdgeProgramManager = to_edge(exported_program)

logging.info(f"Exported graph:\n{edge.exported_program()}")

edge = edge.to_backend(XnnpackPartitioner())
logging.info(f"Lowered graph:\n{edge.exported_program()}")

exec_prog = edge.to_executorch()

with open("mv3-xnnpack.pte", "wb") as file:
    exec_prog.write_to_file(file)

Here are the build commands:

cmake . -DCMAKE_INSTALL_PREFIX=/build \
    -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK/build/cmake/android.toolchain.cmake" \
    -DANDROID_ABI="$ANDROID_ABI" \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -B/build

find "/build" -type f -exec sed -i 's/-fno-exceptions/-fexceptions/g' {} +

cmake --build /build -j16 --target install --config Release

cmake extension/android \
  -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}"/build/cmake/android.toolchain.cmake \
  -DANDROID_ABI="${ANDROID_ABI}" \
  -DCMAKE_INSTALL_PREFIX=/build \
  -DCMAKE_BUILD_TYPE=Release \
  -B/build/extension/android

cmake --build /build/extension/android -j16 --config Release

Are these the expected results and are there any improvements that can be made?

from executorch.

GregoryComer commented on July 22, 2024

Thanks for the info. I will try to reproduce the numbers and get back to you.

from executorch.

kirklandsign commented on July 22, 2024

Hi @laoluani have you tried running the same model with cmake-out-android/backends/xnnpack/xnn_executor_runner? Does it produce the same inference time as your integrated app?

from executorch.

laoluani commented on July 22, 2024

When trying to execute the runner on device with the following command data/local/tmp/executor_runner --model_path=data/local/tmp/mv3-xnnpack.pte, I get the error:

07-09 09:25:38.781 20240 20240 F DEBUG   : Build fingerprint: 'samsung/z3quew/z3q:13/TP1A.220624.014/G988U1UES9HXE2:user/release-keys'
07-09 09:25:38.781 20240 20240 F DEBUG   : Revision: '15'
07-09 09:25:38.781 20240 20240 F DEBUG   : ABI: 'arm64'
07-09 09:25:38.781 20240 20240 F DEBUG   : Processor: '0'
07-09 09:25:38.781 20240 20240 F DEBUG   : Timestamp: 2024-07-09 09:25:38.768291343+0100
07-09 09:25:38.781 20240 20240 F DEBUG   : Process uptime: 1s
07-09 09:25:38.781 20240 20240 F DEBUG   : Cmdline: data/local/tmp/executor_runner --model_path=data/local/tmp/mv3-xnnpack.pte
07-09 09:25:38.781 20240 20240 F DEBUG   : pid: 20237, tid: 20237, name: executor_runner  >>> data/local/tmp/executor_runner <<<
07-09 09:25:38.781 20240 20240 F DEBUG   : uid: 2000
07-09 09:25:38.781 20240 20240 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
07-09 09:25:38.781 20240 20240 F DEBUG   :     x0  0000000000000000  x1  0000000000004f0d  x2  0000000000000006  x3  0000007fdcb23fc0
07-09 09:25:38.781 20240 20240 F DEBUG   :     x4  00000062cfd7cd08  x5  00000062cfd7cd08  x6  00000062cfd7cd08  x7  b4000013c24db19f
07-09 09:25:38.781 20240 20240 F DEBUG   :     x8  00000000000000f0  x9  00000073fee38bf8  x10 0000000000000001  x11 00000073fee79870
07-09 09:25:38.782 20240 20240 F DEBUG   :     x12 0000000000093fa0  x13 0000000000000001  x14 00000062cc878778  x15 00000073fee3c8aa
07-09 09:25:38.782 20240 20240 F DEBUG   :     x16 00000073feee1d70  x17 00000073feebd5b0  x18 00000074004c4000  x19 0000000000004f0d
07-09 09:25:38.782 20240 20240 F DEBUG   :     x20 0000000000004f0d  x21 00000000ffffffff  x22 0000000000093fa0  x23 0000000000000000
07-09 09:25:38.782 20240 20240 F DEBUG   :     x24 b40000719ee24500  x25 b40000719ee244f0  x26 0000007fdcb241d0  x27 0000000000000001
07-09 09:25:38.782 20240 20240 F DEBUG   :     x28 b40000719ee244f0  x29 0000007fdcb24040
07-09 09:25:38.782 20240 20240 F DEBUG   :     lr  00000073fee6a7a8  sp  0000007fdcb23fa0  pc  00000073fee6a7d4  pst 0000000000001000
07-09 09:25:38.782 20240 20240 F DEBUG   : backtrace:
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE: Function names and BuildId information is missing for some frames due
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE: to unreadable libraries. For unwinds of apps, only shared libraries
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE: found under the lib/ directory are readable.
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE: On this device, run setenforce 0 to make the libraries readable.
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE: Unreadable libraries:
07-09 09:25:38.782 20240 20240 F DEBUG   :   NOTE:   /data/local/tmp/executor_runner
07-09 09:25:38.782 20240 20240 F DEBUG   :       #00 pc 00000000000537d4  /apex/com.android.runtime/lib64/bionic/libc.so (abort+168) (BuildId: 870560a8376a70249f9e9a7b480cc02f)
07-09 09:25:38.782 20240 20240 F DEBUG   :       #01 pc 000000000351d120  /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG   :       #02 pc 000000000351d0d8  /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG   :       #03 pc 00000000000a41b8  /data/local/tmp/executor_runner (BuildId: 28004157a3a2fcb96b5c3816839a8d4cae9d06eb)
07-09 09:25:38.782 20240 20240 F DEBUG   :       #04 pc 000000000004b930  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+100) (BuildId: 870560a8376a70249f9e9a7b480cc02f)

from executorch.

kirklandsign commented on July 22, 2024

Hi @laoluani you probably need to use the XNNPACK runner.

from executorch.

laoluani commented on July 22, 2024

Yes that worked for me! What's the best way to get timing info out?

Looks like I'm just getting raw scores.

z3q:/data/local/tmp $ /data/local/tmp/xnn_executor_runner --model_path=./mv3-xnnpack.pte                                                                                 
Output 0: tensor(sizes=[1, 1000], [
  -0.0297435, -0.116339, 0.234517, -0.115163, 0.28406, 1.33266, -1.2022, -0.418206, -0.861483, 0.962639, 
  2.05277, 0.0322932, -0.672343, -0.137665, -0.785477, -0.191068, 0.478215, -1.26891, -0.0889226, -0.0928001, 
  -1.35861, 3.53779, 1.71552, 1.3614, -0.172415, 0.221101, 1.33974, 0.580504, 1.56505, -0.204694, 
  -1.91138, 0.0594309, -0.480634, -0.466428, 0.812006, -1.7187, 0.209461, -1.64942, 2.64706, -1.81845, 
  0.0644182, 0.913044, 1.08294, 2.03153, 0.366071, 1.04362, -0.548808, 2.08426, -1.41956, -1.71163, 
  -1.92737, 0.822696, 1.65539, 0.988034, 0.465189, -1.36933, -0.598565, -0.559072, -0.845721, 1.56357, 
  0.656557, -0.246826, 0.556268, 1.71325, 0.38631, 0.838485, 1.38649, -0.594456, 1.44696, -0.336695, 
  0.817791, 2.16338, 0.452428, 2.40616, -0.203028, 1.58678, -0.00467801, 0.391145, 3.28668, 1.32205, 
  2.26962, 0.525644, 0.393954, 2.02201, -1.76368, -0.793006, 0.871535, 0.615611, -0.240767, 0.662297, 
  -0.822543, 0.770243, 1.65924, -0.345425, -0.102899, 0.538089, 0.0803897, -1.50696, 0.749941, 0.125525, 
  ...,
  0.978575, 2.59865, 3.61896, 0.340622, 0.44494, 0.760727, 2.25554, 0.673822, 0.773279, 0.901685, 
  1.19766, 0.744912, 1.02076, 0.317082, 1.93721, -0.230186, 2.05909, 1.14174, -0.71805, 2.01184, 
  1.97479, 2.68669, 0.0470679, 0.85921, -1.01457, 1.66463, 1.62474, 0.308362, -0.830093, 0.904799, 
  2.00759, -0.694682, -0.0069273, -0.317874, 0.0855309, 0.020355, -0.0538518, -0.501041, 0.918039, -1.4842, 
  1.68228, -0.304506, 0.818993, -0.442584, -1.71591, 0.721102, -1.11913, -0.560189, 0.792763, 1.72536, 
  0.0979954, 0.635417, 0.140096, -0.858204, 0.531698, -1.45714, 0.525842, -0.311962, -0.148772, 0.240123, 
  0.233015, 0.418201, 0.319395, -1.34773, 0.813186, -0.508101, 1.48234, 0.756897, 0.994117, 2.93617, 
  0.502231, 0.777379, 0.751076, -1.91975, 1.34087, 0.581285, 1.6911, 1.16569, 2.25252, 0.19116, 
  0.948301, 0.508456, 0.229856, -0.155784, 0.170956, 0.545924, -0.805998, 0.195962, -0.0608273, -0.785059, 
  -0.438103, -1.82035, -0.877346, -0.129996, -0.643686, 0.254722, -0.689362, -0.311216, 0.469063, 1.01164, 
])```

from executorch.

kirklandsign commented on July 22, 2024

Does adb log have timestamp?

from executorch.

laoluani commented on July 22, 2024

So using the linux time command I got the following: 0m00.15s real 0m00.45s user 0m00.04s system. So looks like the total time was about 15ms. Is the executor runner doing a single forward or multiple?

If this is a single pass, this falls within the range of values I got whilst testing. What's the difference between the executor runner and the actual ExecuTorch runtime?

Correction
This is actually 150ms, so it doesn't fall in the same range as my previous tests. I did a test over 100 iterations and got an average of 130ms.

from executorch.

Slower inference time running MobileNet V3 when compared to PyTorch Mobile about executorch HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent