The nndl-errata from fkdosilovic

0/1 loss function

For the 0/1 loss function $L_i^{(0/1)}$ defined in equation (1.7) of page 9, my understanding of "0/1 loss function" is that its non-zero value should be 1. But when $y_i\ne{\rm sign}(\overline W\cdot\overline{X_i})$, the value of $L_i^{(0/1)}$ is 2 instead of 1. So, should we change equation (1.7) to $$L_i^{(0/1)}=\frac{1}{4}(y_i-{\rm sign}(\overline W\cdot\overline{X_i}))^2=\frac{1}{2}(1-y_i\cdot{\rm sign}(\overline W\cdot\overline{X_i}))?$$

Update of Tikhonov regularization, and in turn the Tikhonov regularizer

Equation 1.33 on page 26 gives formula of updating Tikhonov regularization for perceptron model. The sum term of the equation suggests that the step-size $\alpha$ here is the same as that used in Equation 1.6 on page 8. We know that the standard gradient decent update formula for perceptron (for a single training example) is $\overline W\Leftarrow\overline W-\alpha'\nabla_W L_i$, This update formula is also given in the first line of page 10. However, a careful verification shows that the $\alpha$ in the first line of page 10 is not the $\alpha$ in Equation 1.6 on page 8, if we are to make the two equations consistent. That's why I use a prime in the above update formula. Specifically, we can derive that $\alpha'=2\alpha$ because error $E(\overline X)$ is twice as large as $y$ when a prediction error occurs. So, the gradient decent update formula for perceptron is $$\overline W\Leftarrow\overline W-2\alpha\nabla_W L_i.\tag{1}$$

Now, as section 1.4.1.1 of the text says, the Tikhonov regularization for perceptron is an addition of penalty $\lambda||\overline W||^2$ to the loss function, here the perceptron criterion. To apply gradient decent, we need to take partial derivative of this term with regard to $\overline W$, which is $2\lambda\overline W$. Since it is part of the total loss function, coefficient $-2\alpha$ in $(1)$ will be multiplied, leading to $-2\alpha\cdot(2\lambda\overline W)=-4\alpha\lambda\overline W$. Taking out the common factor, the coefficient of $\overline W$ should be $(1-4\alpha\lambda)$ in Equation 1.33. To make it clearer, Equation 1.33 should be $$\overline W\Leftarrow\overline W(1-4\alpha\lambda)+\alpha\sum\limits_{\overline X\in S} E(\overline X)\overline X.\tag{2}$$

But Equation 1.33 will be used in later chapters of the book, so instead of the change in $(2)$, we keep Equation 1.33 unchanged by adjusting $\lambda$ to $\frac{\lambda}{4}$ to cancel 4. Under such adjustment, the Tikhonov regularizer should be $$\frac{\lambda}{4}||\overline W||^2,$$ which is what I propose to change.

fkdosilovic / nndl-errata Goto Github PK

nndl-errata's People

Contributors

Stargazers

Watchers

Forkers

nndl-errata's Issues

0/1 loss function

Update of Tikhonov regularization, and in turn the Tikhonov regularizer

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent