Derivation of Fisher's Linear Discriminant Old Kiwi - Rhea

In this derivation, we assume the dimension of the space being projected to, k, is k=1.

We attempt to find a linear projection (line that passes through the origin) s.t. the projected data can be "best" separated.

Assume $y_1,..., y_{d_1}$ belongs to Class 1 and $y_{d_1+1}$ ,..., $$ y_d $$ belongs to Class 2.

Consider the sample mean of the projected data for each class

$\tilde{m_1} = \frac{1}{d_1} \sum_{i=1}^{d_1} \pi(y_i) = \frac{1}{d_1}\sum_{i=1}^{d_1} \vec{w} \cdot y_i$ , if $||\omega || =1$

Observation 1

$\tilde{m1} = \vec{w} \frac{1}{d_1} \sum_{i=1}^{d_1} y_i$ , which is the projection of the sample mean for Class 1.

Also, $\tilde{m_2} = \frac{1}{d-d_1}\sum_{i=d_1+1}^{d} \vec{w} \cdot y_i = \vec{w} \cdot m_2$ .

Consider the scatter projections:

$\tilde{S_1}^2 = \sum_{i=1}^{d_1} (\vec{w} \cdot y_i - \tilde{m_1})^2$

$\tilde{S_2}^2 = \sum_{i=d_1+1}^{d} (\vec{w} \cdot y_i - \tilde{m_2})^2$

Define the following cost function, $J(\vec{w}) = \frac{|\tilde{m1} - \tilde{m2}|^2}{\tilde{S_1}^2+ \tilde{S_2}^2}$ .

The above equation is the "Fisher linear discriminant_Old Kiwi".

Observation 2

Demanding $||\tilde{m_1} - \tilde{m_2}||$ to be maximized would not do, since $||\tilde{m_1} - \tilde{m_2}|| = |\vec{w} \cdot m_1 - \vec{w} \cdot m_2 | = | \vec{w} \cdot (m_1 - m_2)| \rightarrow \infty$

Need to fix $|| \omega || = 1$ , resulting in a Constrained Optimization Problem.

By dividing by $\tilde{S_1}^2 + \tilde{S_2}^2$ we demand that $\vec{w} \cdot (m_1 - m_2)$ be large in comparison with the scatter. This allows you to bypass the coordinate scale problem because $J(\vec{w})$ is independent of $\omega$ .

$|\tilde{m_1} - \tilde{m_2}|^2 = | w(m_1 - m_2)|^2 = |w|^2 | \frac{w}{|w|}(m_1-m_2)|^2$

Similarly,

$\tilde{S_1}^2 = \sum (\vec{w}(y_i - m_1))^2 = |w|^2 \sum (\frac{w}{|W|} (y_i - m_1))^2$

and,

$\tilde{S_2}^2 = \sum (\vec{w}(y_i - m_1))^2 = |w|^2 \sum (\frac{w}{|W|} (y_i - m_2))^2$

and,

$J(\vec{w}) = \frac{|m_1-m_2|^2}{S_1^2+S_2^2} =\frac{|w|^2 | \frac{w}{|w|}(m_1-m_2)|^2}{|w|^2(\sum (\frac{w}{|W|} (y_i - m_1))^2 + \sum (\frac{w}{|W|} (y_i - m_2))^2 )}$ is independent of $||\omega||$ .

Importance of scatter:

If we find $\omega$ which makes $$ Jw $$ large, we are guaranteed that the classes are well-separated.

We can write $$ Jw $$ using matrices:

$J(\vec{w}) = \frac{\vec{w}^{\top}S_B \vec{w}}{\vec{w}^{\top}S_w \vec{w}}$ , which is called the "[Generalized Rayleigh Quotient]".

(Lecture 11_Old Kiwi begins here)

Last time we considered $J(\vec{w})=\frac {{\vec{w}}^{T}{S}_{B}\vec{w}} {{\vec{w}}^{T}{S}_{W}\vec{w}}$ , explicit function of $\vec{w}$

one can do this because

(numerator of J) = $\| \tilde{m_1}-\tilde{m_2} \|^2 = \| w(m_1-m_2) \|^2 = \| (m_1-m_2)^T w \|^2$

$$ =w^T (m_1-m_2)(m_1^T-m_2^T)w $$

where, $$ (m_1-m_2)(m_1^T-m_2^T) =S_B $$ is 'between class scatter matrix'.

(denominator of J) = $\tilde{s_1}^2 + \tilde{s_2}^2$

$\tilde{s_1}^2 =\displaystyle \sum_{\vec y \; in \; class} (w y_i- \tilde{m_i} )^2$

$= \displaystyle \sum_{\vec{y} \; in \; class} w^T (y_i-m_i)(y_i^T-m_i^T)w$

$w^T \displaystyle \sum_{\vec{y} \; in \; class} (y_i-m_i)(y_i^T-m_i^T)w$

where, $\displaystyle \sum_{\vec y \; in \; class} (y_i-m_i)(y_i^T-m_i^T) =S_w$ is 'within class scattter matrix'.

Derivation of Fisher's Linear Discriminant Old Kiwi - Rhea

Observation 1

Observation 2

Alumni Liaison