<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

QActivation makes the input binary and always needs to be before a QConvolution

Basically we need the conversion for two reasons: the first one is the same rea

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

BMXNet transition and Gluon hybridization for inference about bmxnet-v2 HOT 12 CLOSED

hpi-xnor commented on July 17, 2024

BMXNet transition and Gluon hybridization for inference

from bmxnet-v2.

Comments (12)

Jopyth commented on July 17, 2024 1

@simonmaurer That is correct.

from bmxnet-v2.

Jopyth commented on July 17, 2024 1

@simonmaurer Sorry for the long wait on the reply: the conversion and execution with C++ API works for our tested models now, but we still have a little bit of cleaning up to do regarding building and CI. Good news is we also upgraded the underlying MXNet to 1.4.0 and we should be able to make the release this or next week.

from bmxnet-v2.

Jopyth commented on July 17, 2024 1

@simonmaurer Just letting you know, that BMXNet with our converter is now available. If you want to use it, please look at the Example/Test, especially the dummy forward pass before training (otherwise the model needs additional changes, by retraining the BatchNorm layers).

from bmxnet-v2.

Jopyth commented on July 17, 2024 1

QActivation makes the input binary and always needs to be before a QConvolution (unless the input is already binary for some reason). Also since they belong together so closely we also added a BinaryConvolution block, and for easier parameterization (e.g. clip_threshold, scaling methods, ...), added activated_conf which uses a previously stored configuration to create such BinaryConvolution blocks). qconv_kwargs is just for testing different configurations of the binary convolution (with and without padding).
As you like, so far we mostly use it as a standalone tool, I only added it for the test case (basically all lines after 62 are just for testing purposes).
We have not yet implemented a complete example with C++ for this new version, but conversion to float32 would be the way to go.
The model converter currently needs to be used to get the faster inference (note: it replaces the layers for training with those optimized for inference and also compresses and transforms the weights). However you can load the deployment model in Python with a SymbolBlock (this is basically done in the test case).
Basically the default way to C++ inference in mxnet should still apply to our framework, except of course you need to load the converted binarized model (not yet tested - if you encounter problems, please create issues as needed).

from bmxnet-v2.

Jopyth commented on July 17, 2024

After we hybridize to a Symbol, we can do the inference also with C++, since we can export it to the usual symbol.json and .param files. We implemented the necessary functions in C++, but in a more modular way (instead of using a whole C++ QConvolution operator, we now use the normal convolution but do the functions needed for binarization before/after the default convolution operator). This is also visible in the symbolic graph now. For example, it contains the det_sign functions as additional ops when directly exporting (you could quickly test this with the mnist example).

As for the conversion script, we are currently working on this, but it is not yet finished. It will remove unnecessary operators from the symbol.json and convert/compress the param file similar to the previous version.

from bmxnet-v2.

Jopyth commented on July 17, 2024

Also, we are implementing a different custom operator, which allows for the fast inference again (but independent of the HybridBlocks used for training). This operator is going to replace our Gluon convolution blocks (during our conversion script).

from bmxnet-v2.