srddev / model-quantization Goto Github PK
View Code? Open in Web Editor NEWQuantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32).