Revision as of 13:17, 28 March 2008 by Huffmalm (Talk | contribs)

(See Lecture_10_-_Batch_Perceptron_and_Fisher_Linear_Discriminant_OldKiwi)

The following theorem, due to Novikoff (1962), proves the convergence of a perceptron_OldKiwi using linearly-separable samples. This proof was taken from Learning Kernel Classifiers, Theory and Algorithms By Ralf Herbrich

Consider the following definitions: A training set $ z=(x,y)\in\mathbb{Z}^m $

"Functional margin" on example $ (x_i,y_i)\in z $ is defined as $ \tilde{\gamma}_i = y_i\langle x_i,c \rangle $;

"Functional margin" on training set $ z $ is defined as $ \tilde{\gamma}_z = \min_{(x_i,y_i)\in z} \tilde{\gamma}_i $;

"Geometrical margin" on example $ (x_i,y_i)\in z $ is defined as $ \gamma_i = \frac{\tilde{\gamma}_i(c)}{\|c\|} $;

"Geometrical margin" on training set $ z $ is defined as $ \gamma_z = \frac{\tilde{\gamma}_z(c)}{\|c\|} $.

Let $ \psi = \max_{x_i\in x} \|\phi(x_i)\| $, where $ \phi(x_i) $ is the feature vector for $ x_i $.

Now let $ c_k $ be a final solution vector after $ k $ steps. Then the last update of the Perceptron algorithm has

$ c_k = c_{k-1} + y_ix_i $.

For any vector, $ c^* $, we have

$ \langle c^*,c_k \rangle = \langle c^*,c_{k-1} \rangle + y_i \langle c^*,x_i \rangle \geq \langle c^*,c_{k-1}\rangle + \gamma_z(c^*) \geq \ldots \geq k\gamma_z(c^*) $,

where the inequalities follow from repeated applications up to step 0 where we assume $ c_0=0 $. Similarly, by the algorithm definition,

$ \|c_k\|^2 = \|c_{k-1}\|^2 + 2y_i\langle c_{k-1},x_i \rangle + \|x_i\|^2 \leq \|c_{k-1}\|^2 + \psi^2 \leq \ldots \leq k\psi^2. $

Then by the Cauchy-Schwartz inequality, we have

$ k\gamma_z(c^*) \leq \langle c^*,c_k \rangle \leq \|c^*\| \cdot \|c_k\| \leq \sqrt{k} \psi $.

It follows, then, that the number of required iterations of the Perceptron algorithm has a finite upper bound, i.e.

$ k\leq \left( \frac{\psi}{\gamma_z(c^*)}\right)^2 $

Alumni Liaison

Questions/answers with a recent ECE grad

Ryne Rayburn