Line 8: | Line 8: | ||
= Outline of the slecture = | = Outline of the slecture = | ||
− | * Background in Linear | + | * Background in Linear Classifiers |
* Support vector machine | * Support vector machine | ||
* Summary | * Summary | ||
Line 15: | Line 15: | ||
---- | ---- | ||
− | == Background in Linear | + | == Background in Linear Classifiers == |
In this section, we will introduce the framework and basic idea of linear classification problem. | In this section, we will introduce the framework and basic idea of linear classification problem. | ||
Line 28: | Line 28: | ||
<center><math> {w_2} </math> if <math> \textbf{c}\cdot\textbf{y}<0</math> </center> | <center><math> {w_2} </math> if <math> \textbf{c}\cdot\textbf{y}<0</math> </center> | ||
− | We can apply a trick here to replace all <math>\textbf{y}</math>'s in class <math>w_2</math> by <math>-\textbf{y}</math>, then the task is looking for <math>\textbf{c}</math> so that | + | We can apply a trick here to replace all <math>\textbf{y}</math>'s in class <math>w_2</math> by <math>-\textbf{y}</math>, then the above task is equivalent to looking for <math>\textbf{c}</math> so that |
<center><math>\textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in </math>new sample space. </center> | <center><math>\textbf{c}\cdot \textbf{y}>0, \forall \textbf{y} \in </math>new sample space. </center> | ||
− | You might have already | + | You might have already observed the ambiguity of c in the above discussion: if c separates data, <math>\lambda \textbf{c}</math> also separates the data. One solution might be set <math>|\textbf{c}|=1</math>. Another solution is to introduce the concept of "margin" which we denote by b, and ask |
<center><math>\textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} </math>.</center> | <center><math>\textbf{c}\cdot\textbf{y}\geqslant b > 0, \forall \textbf{y} </math>.</center> | ||
Line 62: | Line 62: | ||
== Support Vector Machine== | == Support Vector Machine== | ||
− | Support | + | Support vector machines are an example of a linear two-class classifier |
Revision as of 10:15, 2 May 2014
'Support Vector Machine and its Applications in Classification Problems
A slecture by Xing Liu
Partially based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.
Outline of the slecture
- Background in Linear Classifiers
- Support vector machine
- Summary
- References
Background in Linear Classifiers
In this section, we will introduce the framework and basic idea of linear classification problem.
In a linear classification problem, the feature space can be divided into different regions by hyperplanes. In this lecture, we will take a two-catagory case to illustrate. Given training samples $ \textbf{y}_1,\textbf{y}_2,...\textbf{y}_n \in \mathbb{R}^p $, each $ \textbf{y}_i $ is a p-dimensional vector and belongs to either class $ w_1 $ or $ w_2 $. The goal is to find the maximum-margin hyperplane that separate the points in the feature space that belong to class $ w_1 $ from those that belong to class $ w_2 $. The discriminate function can be written as
We want to find $ \textbf{c}\in\mathbb{R}^{n+1} $ so that a testing data point $ \textbf{y}_i $ is labelled
We can apply a trick here to replace all $ \textbf{y} $'s in class $ w_2 $ by $ -\textbf{y} $, then the above task is equivalent to looking for $ \textbf{c} $ so that
You might have already observed the ambiguity of c in the above discussion: if c separates data, $ \lambda \textbf{c} $ also separates the data. One solution might be set $ |\textbf{c}|=1 $. Another solution is to introduce the concept of "margin" which we denote by b, and ask
In this scenario, the hyperplane is defined by $ \textbf{c}\cdot \textbf{y} = \textbf{b} $. $ \textbf{c} $ is the normal of the plane lying on the positive side of every hyperplane. $ \frac{b_i}{||c||} $ is the distance from each point $ \textbf{y}_i $ to the hyperplane.
The above approach is equivalent to finding a solution for
where $ \textbf{Y} =\begin{bmatrix} y_1^T \\ y_2^T \\ ... \\ y_n^T \end{bmatrix} $
In most cases when n>p, it is always impossible to find a solution for $ \textbf{c} $. An alternative approach is to find c that minimize a criterion function $ J(\textbf{c}) $. There are variant forms of criterion functions. For example, we can try to minimize the error vector between $ \textbf{c}\cdot\textbf{y} $ and $ b $, hence the criterion function can be defined as
The solution to the above problem is
if $ det(\textbf{Y}^T\textbf{Y})\neq 0 $
otherwise, the solution is defined generally by
This solution is a MSE solution to $ \textbf{Y}\textbf{c} = \textbf{b} $and it always exists.
Support Vector Machine
Support vector machines are an example of a linear two-class classifier