Revision as of 14:26, 20 April 2010 by Ilaguna (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In my research area (error detection in software systems), we typically want to perform 2-class classification. For example, we want to determine whether the condition of a system is normal (class 1) or abnormal (class 2) based on some metrics. However, sometimes we only have training data of one class, typically of normal behavior. In those cases, we cannot use traditional Support Vector Machines (SVM) because they are aimed for 2-class classification problems.

Support Vector Domain Description (SVDD) [1] is a technique that I have found useful for cases when we only have data of one class. It is essentially a modification of SVM to work in one-class scenarios. Here's a brief description of the technique.

Support Vector Domain Description

SVDD has been proposed in [1] for outlier or novelty detection. The basic idea is that it tries to find a hyper-sphere in the feature space which can enclose all (or almost all) the training data with the minimum volume. Mathematically, it is aimed to minimize the following objective $ F(R,a,\xi_i) = R^2 + C \sum_{i=1}^N \xi_i $ where $ a $ is the center of the sphere, $ R $ is the radius, $ C $ gives the trade-off between simplicity (or volume of the sphere) and the number of errors (number of target objects rejected), and $ \xi $'s are slack variables that have the same functionality as in traditional SVM for classification. This has to be minimized under the constraints: $ (x_i - a)^T (x_i - a) \leq R^2 + \xi_i\text{,} \qquad \xi_i \geq 0 $.

By incorporating those constraints into the objective function, we get the Lagrangian[1]:

$ L(R,a,\alpha_i,\xi_i) = R^2 + C \sum_{i=1}^N \xi_i - \sum_{i=1}^N \gamma_i \xi_i - \sum_{i=1}^N \alpha_i \left( R^2 + \xi_i - (x_i - c)^T (x_i - c) \right) $

with Lagrange multipliers $ \alpha_i, \gamma_i \geq 0 $. By setting the derivatives with respect to the primal variables $ c $, $ \xi_i $ and $ R $ to zero, we get $ c = \sum_{i=1}^N \alpha_i x_i $, $ 0\leq \alpha_i \leq C $, and $ \sum_{i=1}^N \alpha_i = 1 $.

Substituting the above formulas into the Lagrangian we obtain the following dual problem where we maximize with respect to $ \alpha_i $:

$ L = \sum_{i=1}^N \alpha_i (x_i^T \cdot x_i) - \sum_{i,j=1}^N \alpha_i \alpha_j (x_i^T \cdot x_i) $

with constraints $ \sum_{i=1}^N \alpha_i = 1 $ and $ 0 \leq \alpha_i \leq C $. Like in SVM, we can replace the inner product $ (x_i^T \cdot x_i) $ by any positive definite kernel $ K(x,x') $. Then, the resulting hyper-sphere will enclose data examples that are mapped into a high dimensional feature space. If we do so, we maximize with respect to $ \alpha_i $:

$ L = \sum_{i=1}^N \alpha_i K(x_i,x_i) - \sum_{i,j=1}^N \alpha_i \alpha_j K(x_i,x_j). $

To test whether a new data point $x$ is within the spherical region, we evaluate the sign of

$ y(x) = \sum_{i=1}^N \alpha_i K(x,x_n) + b $

where the threshold parameter $ b<math> is determined as in traditional SVM by summing <math>\alpha_i K(x',x_i) $ for all the support vectors $ x' $, and then getting an average for the different values of $ b $. A new point $ x $ is said to be in the sphere if $ y(x) \geq 0 $, and outside the sphere if $ y(x) < 0 $.

Some Experiments


References:

[1] D. M. J. Tax and R. P. W. Duin, "Support vector domain description," Pattern Recogn. Lett., 20(11-13):1191–1199, 1999.

Alumni Liaison

Correspondence Chess Grandmaster and Purdue Alumni

Prof. Dan Fleetwood