Modeling the logic utilization of Haddoc2 generated mappings. We rely on linear models to predict the logic resource (reported in ALMs) generated by the SCM (Multipliers) and MOA (Adders) parts. The inputs of this linear model are metrics that are computed directly from the topology and weights of a given CNN. These metrics are:
-
nb_null
Number of null values in a 3D given convolution kernel -
nb_pow2
Number of weights that are equal to a power of two in a 3D given convolution kernel. The multiplication by these weights is implemented by means of shift registers, which are less resources consuming than multipliers. -
nb_bit1
Number of bits that are set to one in a given 3D convolution kernel. Intuitively, higher is this number, higher are the resource utilization. -
nb_efbw
: With the metrics above, we were not able to accurately predict the hardware resources, especially the adder parts, so we came up with this gem. In fact, the accumulation of partial products in Haddoc2 is achieved with a MOA that inputs multiple operands with variable bitwidths. The circuitry of such an adder has complexity that is correlated to the number of inputs, but also to the numerical dynamic of the partial sums. To illustrate this, let's consider the example of a dot-product of a vectorx
with a weight vectorw
such asw = [2 0 18 256]
and let's suppose weights and inputs are represented in an 8 bits fixed point format.- The multiplication by the first coefficient can be implemented with a shift register and the resulting partial product
p[0] = x[0] * w[0]
requires8+mcl(2) = 9 bits
to be represented, wheremcl(x) = max(ceil(log2(x)))
- The multiplication by the second coefficient is skipped and does not generate any partial product.
- The multiplication by the third coefficient requires
8+mcl(18) = 13 bits
to be represented. - The multiplication by the last coefficient is implemented by means of shift register and the partial product requires
8+mcl(256) = 16 bits
to be represented. - Finally, the accumulation of these partial terms is achieved with a MOA that inputs respectively 9, 13 and 16 bits. The circuitry of this adder has thus a complexity that is correlated to the number of partial products and their numerical dynamic, which in turn is related to the numerical dynamic of the 3D convolution kernel weights. The
nb_efbw
of a given kernel is defined as:nb_efbw = sum(bw_in + mcl(bw_theta))
.
We found that this
nb_efbw
metric is the most pertinent to model the hardware resource models, as shown in the following table, whereR_squared
scores of the models with different features are reported. TheGLM
stantds for the Generalized Linear Model in which all the four previous features are associated to model the resource usage. - The multiplication by the first coefficient can be implemented with a shift register and the resulting partial product
MOA | Alexnet | Squeezenet | Alexnet-Comp. |
---|---|---|---|
nb_null | 0.7345 | ||
nb_pow2 | 0.3722 | 0.3851 | |
nb_bit1 | 0.6589 | 0.5779 | 0.6744 |
nb_efbw | 0.7759 | 0.7109 | 0.7784 |
GLM | 0.8139 | 0.7372 | 0.8105 |
SCM | Alexnet | Squeezenet | Alexnet-Comp. |
---|---|---|---|
nb_null | 0.7345 | ||
nb_pow2 | 0.2230 | 0.2250 | |
nb_bit1 | 0.5884 | 0.5070 | 0.6098 |
nb_efbw | 0.7262 | 0.5906 | 0.7481 |
GLM | 0.8010 | 0.6902 | 0.8328 |