Giter Club home page Giter Club logo

Comments (12)

Jopyth avatar Jopyth commented on July 17, 2024 1

@simonmaurer That is correct.

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024 1

@simonmaurer Sorry for the long wait on the reply: the conversion and execution with C++ API works for our tested models now, but we still have a little bit of cleaning up to do regarding building and CI. Good news is we also upgraded the underlying MXNet to 1.4.0 and we should be able to make the release this or next week.

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024 1

@simonmaurer Just letting you know, that BMXNet with our converter is now available. If you want to use it, please look at the Example/Test, especially the dummy forward pass before training (otherwise the model needs additional changes, by retraining the BatchNorm layers).

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024 1
  1. QActivation makes the input binary and always needs to be before a QConvolution (unless the input is already binary for some reason). Also since they belong together so closely we also added a BinaryConvolution block, and for easier parameterization (e.g. clip_threshold, scaling methods, ...), added activated_conf which uses a previously stored configuration to create such BinaryConvolution blocks). qconv_kwargs is just for testing different configurations of the binary convolution (with and without padding).
  2. As you like, so far we mostly use it as a standalone tool, I only added it for the test case (basically all lines after 62 are just for testing purposes).
  3. We have not yet implemented a complete example with C++ for this new version, but conversion to float32 would be the way to go.
  4. The model converter currently needs to be used to get the faster inference (note: it replaces the layers for training with those optimized for inference and also compresses and transforms the weights). However you can load the deployment model in Python with a SymbolBlock (this is basically done in the test case).
  5. Basically the default way to C++ inference in mxnet should still apply to our framework, except of course you need to load the converted binarized model (not yet tested - if you encounter problems, please create issues as needed).

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024

After we hybridize to a Symbol, we can do the inference also with C++, since we can export it to the usual symbol.json and .param files. We implemented the necessary functions in C++, but in a more modular way (instead of using a whole C++ QConvolution operator, we now use the normal convolution but do the functions needed for binarization before/after the default convolution operator). This is also visible in the symbolic graph now. For example, it contains the det_sign functions as additional ops when directly exporting (you could quickly test this with the mnist example).

As for the conversion script, we are currently working on this, but it is not yet finished. It will remove unnecessary operators from the symbol.json and convert/compress the param file similar to the previous version.

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024

Also, we are implementing a different custom operator, which allows for the fast inference again (but independent of the HybridBlocks used for training). This operator is going to replace our Gluon convolution blocks (during our conversion script).

from bmxnet-v2.

simonmaurer avatar simonmaurer commented on July 17, 2024

@Jopyth ok, thanks. in other words for fast inference (that is the custom implementation of gemm kernels as found in https://github.com/hpi-xnor/BMXNet/tree/master/smd_hpi/src) you are still in the process of rewriting that part?
for now the binary weights are still treated and saved as float32 throughout the Gluon code and the code for approximated multiplications (using XNOR and bitcount operations) is yet to be reimplemented from BMXNet v1 - is that what your comment

We do not yet support deployment and inference with binary operations and models (please use the first version of BMXNet instead if you need this).

in the ReadMe refers to?

from bmxnet-v2.

simonmaurer avatar simonmaurer commented on July 17, 2024

@Jopyth overall great job and findings in your paper. I am really interested in your work/BMXNetv1 and for realtime applications I'd like to dig into binarized networks and timing analysis (which is why I'm so eager to be able to run it in C++ including faster inference ;) )
any news regarding conversion script?
also could you elaborate a bit on what is actually happening during the conversion script - I still dont quite get the point why you need to convert the symbol.json and param file when you already have implemented the underlying C/C++ operators (or is the C++ API using different operators? - might be the reason why even vanilla MXNet 1.4.0 still doesnt support reduced precision ie. float16 in C++ API)
maybe because you created custom operators but only in Python?

from bmxnet-v2.

Jopyth avatar Jopyth commented on July 17, 2024

Basically we need the conversion script for two reasons: the first one is the same reason as in the first BMXNet (we need to compress the binary weights with bit-packing). The second one is the one you mentioned: We use different operators between the training with Python and inference with C++. previously we had the functionality for training and inference (sped up on CPU) in the same layer and chose which version to execute based on inference setting and device. Now we have split up training and inference: training is done with multiple layers (in Gluon/Hybrid mode) but during inference we only use our one layer our (sped-up) custom convolution.

from bmxnet-v2.

simonmaurer avatar simonmaurer commented on July 17, 2024

@Jopyth thanks a lot for pointing that out. looking forward to this useful addition and the upgrade to 1.4.0 - very nice!
also there's an interesting discussion regarding C vs C++ API in the official MXNet github repo. C++ API is just a frontend implementation just like Python but according to the discussion its missing some modules to make use of the fast float16 inference, see. https://github.com/apache/incubator-mxnet/issues/14159#issuecomment-483883108.
so <mxnet/c_predict_api.h> referes to the C API that is able to do the fast inference whereas this is not yet true for C++ API <mxnet-cpp/MxNetCpp.h>

from bmxnet-v2.

simonmaurer avatar simonmaurer commented on July 17, 2024

@Jopyth that is great! also noteworthy that you keep things updated (ie. MXNet 1.4.1) - very appreciated

closing questions I still have:

  1. when you build your models - why does the QActivation come before QConvolution ? is it a special case that you use **qconv_kwargs in QConv2D - maybe for debugging purposes as used in the code ?
  2. you mentioned Example/Text:
    do we just convert the model by using subprocess inside Python code (model conversion is done transparently with export when using QActivation/QConv2D/QDense
    ->
    output = subprocess.check_output(["build/tools/binary_converter/model-converter", param_file])
    or use Binary converter as a standalone tool ?
  3. how do you handle your input matrices/images(Python AND C++) ? keeping them as NDArray uint8 from OpenCV(or equivalent) or conversion to float32/float16..?
  4. the fast inference (backend operators with fast GEMM) is also used when we deploy hybridized models with Python ? or only if we use a model as output from the new converter?
  5. we never talked about this: a hint on how one can correctly load the converted model in C/C++, ie. which API to use for fast inference ?

from bmxnet-v2.

simonmaurer avatar simonmaurer commented on July 17, 2024

alright, pretty enlightening!

  1. thanks for pointing it out. am pretty to used to introducing non-linearities after linear combinations. does that also mean that if I have multiple QConvolutions I actually wouldnt need an activation layer anymore in front because the output of the preceeding layer (say QConv2D and QDense) is already binarized?
  2. so you tested the converted model with faster inference in Python I guess? will gladly provide you with information regarding C inference. not sure yet if the C++ API (which is also only a wrapper) will work..

from bmxnet-v2.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.