The paper from masknet uses layernorm. however the code implementation uses batchn

I am not sure I am following see this screenshot. <a target="_blank" rel="noopener

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Masknet replace batchnorm with layernorm about deeprec HOT 6 OPEN

zippeurfou commented on June 14, 2024

Masknet replace batchnorm with layernorm

from deeprec.

Comments (6)

StevenShi-23 commented on June 14, 2024

Hi Marc,

Thanks for bringing this up! This is indeed a bug, and we are fixing it.

from deeprec.

StevenShi-23 commented on June 14, 2024

Hi Marc,

Upon checking, this is not a bug. When applying BatchNorm on the default axis (last dim), BatchNorm reduces to LayerNorm, and since the size of gamma/beta depends on the shape of input tensor, the original implementation is still correct.

However, for the clarity of the code, we updated the example (ref PR #816 ).

Thanks for the comment!

from deeprec.

zippeurfou commented on June 14, 2024

I am not sure I am following see this screenshot.

What am I missing?

from deeprec.

Duyi-Wang commented on June 14, 2024

Because your code isn't in trianing.

tf.layers.batch_normalization() will call to class BatchNormalizationBase

DeepRec/tensorflow/python/keras/layers/normalization.py

Line 43 in 6bd822e

class BatchNormalizationBase(Layer):

tf.keras.layers.LayerNormalization() will call to class LayerNormalization

DeepRec/tensorflow/python/keras/layers/normalization.py

Line 898 in 6bd822e

class LayerNormalization(Layer):

In LayerNormalization, mean and var are computed by nn.moments

DeepRec/tensorflow/python/keras/layers/normalization.py

Line 1025 in 6bd822e

mean, variance = nn.moments(inputs, self.axis, keep_dims=True)

then use nn.batch_normalization to get the result.

DeepRec/tensorflow/python/keras/layers/normalization.py

Lines 1040 to 1046 in 6bd822e

 outputs = nn.batch_normalization( 

 inputs, 

 mean, 

 variance, 

 offset=offset, 

 scale=scale, 

 variance_epsilon=self.epsilon)

It is the same with BN without other features.

DeepRec/tensorflow/python/keras/layers/normalization.py

Lines 643 to 652 in 6bd822e

 def _moments(self, inputs, reduction_axes, keep_dims): 

 mean, variance = nn.moments(inputs, reduction_axes, keep_dims=keep_dims) 

 # TODO(b/129279393): Support zero batch input in non DistributionStrategy 

 # code as well. 

 if self._support_zero_size_input(): 

 inputs_size = array_ops.size(inputs) 

 mean = array_ops.where(inputs_size > 0, mean, K.zeros_like(mean)) 

 variance = array_ops.where(inputs_size > 0, variance, 

 K.zeros_like(variance)) 

 return mean, variance

DeepRec/tensorflow/python/keras/layers/normalization.py

Lines 736 to 739 in 6bd822e

 mean, variance = self._moments( 

 math_ops.cast(inputs, self._param_dtype), 

 reduction_axes, 

 keep_dims=keep_dims)

DeepRec/tensorflow/python/keras/layers/normalization.py

Lines 820 to 825 in 6bd822e

 outputs = nn.batch_normalization(inputs, 

 _broadcast(mean), 

 _broadcast(variance), 

 offset, 

 scale, 

 self.epsilon)

But the difference is that when you are not in training, the mean and var of BN will be replaced.

DeepRec/tensorflow/python/keras/layers/normalization.py

Lines 744 to 750 in 6bd822e

 mean = tf_utils.smart_cond(training, 

 lambda: mean, 

 lambda: ops.convert_to_tensor(moving_mean)) 

 variance = tf_utils.smart_cond( 

 training, 

 lambda: variance, 

 lambda: ops.convert_to_tensor(moving_variance))

from deeprec.

Duyi-Wang commented on June 14, 2024

you can add input param moving_mean_initializer='ones' which is defaulted to 'zeros' and find output is changed.

from deeprec.

zippeurfou commented on June 14, 2024

Thanks @Duyi-Wang it makes sense. I was confused by it as well but the doc clearly state it. Thanks for pointing out the code.
Adding a screenshot for posterity.

Feel free to close this one.

from deeprec.

Masknet replace batchnorm with layernorm about deeprec HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	outputs = nn.batch_normalization(
	inputs,
	mean,
	variance,
	offset=offset,
	scale=scale,
	variance_epsilon=self.epsilon)

	def _moments(self, inputs, reduction_axes, keep_dims):
	mean, variance = nn.moments(inputs, reduction_axes, keep_dims=keep_dims)
	# TODO(b/129279393): Support zero batch input in non DistributionStrategy
	# code as well.
	if self._support_zero_size_input():
	inputs_size = array_ops.size(inputs)
	mean = array_ops.where(inputs_size > 0, mean, K.zeros_like(mean))
	variance = array_ops.where(inputs_size > 0, variance,
	K.zeros_like(variance))
	return mean, variance

	mean, variance = self._moments(
	math_ops.cast(inputs, self._param_dtype),
	reduction_axes,
	keep_dims=keep_dims)

	outputs = nn.batch_normalization(inputs,
	_broadcast(mean),
	_broadcast(variance),
	offset,
	scale,
	self.epsilon)

	mean = tf_utils.smart_cond(training,
	lambda: mean,
	lambda: ops.convert_to_tensor(moving_mean))
	variance = tf_utils.smart_cond(
	training,
	lambda: variance,
	lambda: ops.convert_to_tensor(moving_variance))