Currently the repo contains:
-
SVM_Gaussian: SVM and SVM with Gaussian Kernel
-
XGBoost: boosting tree regressor
This repository contains the R version and Python version of the SVM function from scratch. The SVM function constructed here only contains the linear SVM without the bias term and the SVM with the Gaussian Kernel. The opitmization algorithm here used is the Coordinate Ascent.
The SVM with L2 norm regularization has the loss function based on the hinge loss that
where the hinge loss could be represented by a continuous mathmatical function:
So we could minimize the new loss function:
This is the Primal problem we want to deal with.
By rotating the min and max, we could transfer the Primal problem to a Dual problem that
The kernel trick can be applied here that
Either way, if you want linear SVM, let
and if you want Kernel SVM, let
Now our loss function becomes
To solve this Dual problem, we would want to use the Coordinate Ascent.
Assume
As a result, the update could be
What if we want to include Bias term?
One thing need to be taken care of is that if we include Bias, we won't want to penalize it. There is no information about the relation between the predictor variable and response variable in the bias term. Then we would see our modified loss function:
We will still minimize the inside part first. Because of the bias term, we have one new constrain now, which is
The other part still the same. But in the optimization process, we need to maintain this constrain all the time. To do that, there is a simple way. We can just update a pair of the coordinates every step, one plus and one minus the same value. So that the constrain would be maintained.
Boosting technique is also called 'matching pursuit'. In a more intuitive way, the xgboost is a improvement of gradient boost. Though the Xgboost with regularization has a relatively complicated form, in this reopsitory, I will just introduce the form without the regularization terms.
The idea of boosting is to keep training weak classifiers and agreggate them into a strong classifier. The final result would be like a vote from all the selected weak classifiers. Generally, we could choose all different kinds of classifiers, but people would usually use the Tree regressor. Then the object score, which is used to do the final classification, will be in this form:
We could write the Loss function of the XGBoost Tree model as
We could expand the loss function by the Taylor expansion to the second derivative:
Assume
Then
As a result, the original minimizing problem becomes:
Which is just minimizing the weighted squared loss of choosing the Tree regressor classifier. We would notice that it is a kind of iterating reweighted least square where
We could notice that if the second derivative is 1, then our loss function is going to be the loss function for the gradient boost.