We are developing Camera post process @ DMFT (Windows User space DLL), and we curr

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

For TensorFlow performance tuning, most of the tools that work for CUDA also wor

Before trying to do multithreading, I would suggest looking at the GPU usage for

Windows Camera post process(DMFT) with DirectML(Tensorflow) about tensorflow-directml HOT 10 OPEN

microsoft commented on July 21, 2024

Windows Camera post process(DMFT) with DirectML(Tensorflow)

from tensorflow-directml.

Comments (10)

PatriceVignola commented on July 21, 2024 1

If you already have a trained python TensorFlow model, you could freeze it into a .pb file and use the tensorflow-directml C API to load it at runtime. Would that be a good solution for you?

Once your model has been converted to a frozen .pb file, you can use the C API to load it with TF_LoadSessionFromSavedModel and then call TF_SessionRun.

I believe this is the most straightforward and fastest way to get your model working with DirectML. If you need help navigating the TensorFlow C API, please let us know!

Edit: Alternatively, if you are familiar with ONNX and can convert your model to an ONNX model, you could even use onnxruntime instead of TensorFlow, which can use DirectML underneath.

from tensorflow-directml.

PatriceVignola commented on July 21, 2024 1

Hi @MarkHung00,

I created a basic sample over here. The sample goes through the process of loading a frozen squeezenet.pb model, creating a graph from it and finally creating a session. Feel free to extract the parts that are relevant for you and let me know if you run into any issues or have other questions!

from tensorflow-directml.

PatriceVignola commented on July 21, 2024 1

@MarkHung00 tensorflow-directml supports a subset of the operators supported by the default GPU (CUDA) device. To see which data types each operator supports, you can look at the source. For example, for Gather:

tensorflow-directml/tensorflow/core/kernels/dml_gather_op.cc

Line 347 in a4a0e27

TF_CALL_float(DML_REGISTER_KERNELS);

For example, FP32 and FP16 is the data type the most commonly supported across DML operators, while int32 is reserved for CPU instead.

from tensorflow-directml.

PatriceVignola commented on July 21, 2024 1

For TensorFlow performance tuning, most of the tools that work for CUDA also work for DirectML. For example, we like to use the chrome tracing format outlined in the post since it shows a good timeline of all operators that are being executed, and is easy to read.
Yes, each operator has a different list of data types that it supports. For example, if you look at the bottom of the Convolution page, you see that it supports float16 and float32.
DirectML performance on Intel heavily depends on the devices, but we're working with them to make sure that DirectML becomes a competitive framework on their platform.

Also, take note that this repository (tensorflow-directml 1.15) is mostly in maintenance mode. We're still doing bug fixes and improving performance, but we're now more focused on the preview of our plugin for TF 2. We don't have a C API for the plugin yet, but it's coming soon!

from tensorflow-directml.