Comments (9)
QKeras is a library for Quantization-Aware Training. It still uses float parameters and activation while training, it just simulates quantizing a tensor by limiting the float tensor to a set of values representable with fixed-point parameters. That is why you will not see a difference in size of the model when saved as .h5, and also why training a model in QKeras is slower then training a normal model. QKeras does not currently provide a method of deploying these models to the CPU, where it would use actually n-bits parameters. There are however tools such as hls4ml that can deploy such a model to an FPGA circuit.
from qkeras.
Hi @jurevreca12 Thanks for your reply, so to see the actual difference between qkeras model and baseline keras model which is not quantized in terms of inference time and model size(utilization) I need to port the model on FPGA using the hls4ml tool am I right? Is there any other way to notice the inference time and utilization at the software level without actually porting on FPGA, like I mean any memory profiling tools that can able to tell the difference between the quantized model and the unquantized model in terms of inference time and resource utilisation?
from qkeras.
QKeras can perform quantization to n-bits. This doesn't work well with processors, which typically have ALUs that support 8-bit, 16-bit, 32-bit operations.. Now. That doesn't mean that you can't run models quantized for example 3-bits ona CPU, but its really not straight forward. If you are looking to run your model on a CPU, I suggest you use another quantization library (i.e. https://www.tensorflow.org/model_optimization/guide/quantization/training). Those will quantize to 8-bits, but you will be able to deploy the model on the CPU.
In FPGAs you can make the ALU any size you want, that is why its a good fit with QKeras.
from qkeras.
@jurevreca12 Thanks for your answer, I have one more query, I actually want to train a binary neural network(BNN) with one-bit weights and activations using qkeras, so I trained a bnn model, now if I want to observe inference time and model size I cannot observe it in CPU right am I correct, since the CPU architecture is not designed for 1bit precision, please correct me if I am wrong. So in order to observe inference time for the bnn model or any other bit precision like ternary(3bit), or 6bit precision which is not flexible with CPU architecture, I need to port the model on FPGA using hls4ml and then observe the model size and inference time right?
from qkeras.
You can deploy binarized neural networks to a cpu, but qkeras doesn't allow you to directly generate an implementation for CPU (Binarized networks will use the XNOR operation, which CPUs do support.). For binarized neural networks you can also train a network with the Larq library. And I believe they even have a CPU deployment engine, all though I am not sure if it is openly available (https://docs.larq.dev/compute-engine/).
from qkeras.
Thanks @jurevreca12, Thank you for answering all my queries patiently, earlier I have worked on larq platform for training BNN but they are not suitable to port it on FPGA using hls4ml, since hls4ml doesn't support larq. So if I want to have a comparative analysis of my model in terms of model size and inference for different precision varying from 32 bit(baseline), 16bit, 8 bit,6bit,4bit,3bit,2bit and 1bit using Qkeras, I cannot do it on CPU, the only was is to port the models on FPGA using hls4ml tool flow, am I correct, please correct me if I am wrong, or is there any other way that you can suggest where I can get comparative analysis of the models in terms of model size and inference for different bit precision without porting it on hardware(FPGA).
from qkeras.
Hello, I would like to ask you what is your version of keras and Qkeras, I am using the latest version of tf2.11 there is an incompatibility problem
from qkeras.
Hi, I am using Qkeras version ==0.9.0 and TensorFlow version of ==2.12.0, I didn't face any incompatibility problem. If there is anything please let us know.
from qkeras.
Hi, I am using Qkeras version ==0.9.0 and TensorFlow version of ==2.12.0, I didn't face any incompatibility problem. If there is anything please let us know.
Strange to say, I was using TF version 2.11.2 at the time, and the problem when installing Qkeras would uninstall my keras first, but now I try to use your TF version and the problem disappears
from qkeras.
Related Issues (20)
- How low precision weights and biases are stored in QKeras? HOT 4
- When I use QKeras: Failed to load in-memory CUBIN: CUDA_ERROR_NO_BINARY_FOR_GPU: no kernel image is available for execution on the device [Op:Abs] HOT 2
- Params not quantized after model_save_quantized_weights function HOT 4
- Only Qconv layer's output tensors are quantized
- Cannot convert 6.0 to EagerTensor of dtype int64
- How do I save an AutoQKeras model that a different script can load?
- `pyparser` vs `pyparsing`
- Can QKeras support Full integer quantization HOT 8
- Add a custom layer with a bitwise operation
- TFLite compatibility
- How can I get the scale of the QAdaptiveActivation layer
- How do keras and qkeras versions correspond to each other?
- Adding QKeras to conda forge
- TypeError: Could not locate class 'QConv2D'. HOT 4
- Attribute `__name__` missing from QActivation layer
- Unpredictable quantization with quantized_bits and an alpha value of None
- QKeras fails due to missing modules and numpy error message with latest TensorFlow version 2.16.1
- Upgrade to keras v3? HOT 1
- Get scale value after Full Integer Quantization HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qkeras.