(Algorithms for clustering from feature vector)
(Algorithms for clustering from feature vector)
Line 27: Line 27:
 
<math>S_w= \sum _{i=1, X_i \in S_1} ^d ||X_i - \mu _1||^2 + \sum _{i=1, X_i \in S_2} ^d ||X_i - \mu _2||^2</math>  (2-1)
 
<math>S_w= \sum _{i=1, X_i \in S_1} ^d ||X_i - \mu _1||^2 + \sum _{i=1, X_i \in S_2} ^d ||X_i - \mu _2||^2</math>  (2-1)
  
<math>\mu _1 = \frac{1}{|S_1|} \sun _{X_i \in S_1} X_i</math>, <math>\mu _2 = \frac{1}{|S_2|} \sum _{X_i \in S_2} X_i</math> (2-2)
+
<math>\mu _1 = \frac{1}{|S_1|} \sum _{X_i \in S_1} X_i</math>, <math>\mu _2 = \frac{1}{|S_2|} \sum _{X_i \in S_2} X_i</math> (2-2)
  
 
<math>S_m</math> is "mixture scatter matrix"
 
<math>S_m</math> is "mixture scatter matrix"
Line 93: Line 93:
  
 
Look at evolution of J for c increase (similarity scale)
 
Look at evolution of J for c increase (similarity scale)
 +
 +
<<Picture>>
 +
 +
An important J
 +
 +
"Square error criterion"
 +
 +
<<eqn>>
 +
 +
* Good  when clusters are compact, well separated
 +
* Sensitive to outliers
 +
 +
<<eqn>> is more robust to outliers
 +
 +
Can use other types of similarity measure
 +
 +
<<eqn>>
 +
 +
to speed up optimization of J
 +
 +
Use "Nearest mean reclassification rule"
 +
 +
* choose initial partition <math>S_1, S_2 , \cdots , S_c</math>
 +
 +
* calculate <math>\mu _1 , \mu _2 , \codts , \mu  _c </math>
 +
 +
* reclsssify each <math>X_i</math> to the class of the nearest mean
 +
 +
* If cluster have changed, repeat
 +
 +
Note: <math>X_i</math> class of its nearest mean
 +
 +
is same as choosing the move for <math>X_i</math> that moves <math>\Delta J</math> as negative as possible
 +
because <<eqn>>
 +
 +
If <<eqn>>,
 +
<<eqn>>
 +
 +
 +
<<eqn>> is as negative as possible when <<eqn>>
 +
 +
Can use FORGY, CLUSTER
 +
 +
Observation
 +
 +
<<eqn>>
 +
 +
This is a distance based clustering method.
 +
 +
No need for feature vector

Revision as of 06:18, 16 April 2008

Clustering Methods

Algorithms for clustering from feature vector

Called "Partitional clustering" in Jain and Dude as opposed to "hierarchical clustering"

Clustering feature vectors = finding separation between clusters but these are not known

<<Picture>>

We have a set of points S Want to find subsets $ S_1 , S_2 , \cdots , S_c $ such that $ S=S_1 \cup S_2 \cup \cdots \cup S_c $ and $ S_i \cap S_j = \phi $

need to define "clustering criteria" i.e., a measure of how natural the clustering is.

if c=2,

consider

$ J=tr(S_m ^{-1} S_w) $

where $ S_w $ is "within class scatter matrix"

$ S_w= \sum _{i=1, X_i \in S_1} ^d ||X_i - \mu _1||^2 + \sum _{i=1, X_i \in S_2} ^d ||X_i - \mu _2||^2 $ (2-1)

$ \mu _1 = \frac{1}{|S_1|} \sum _{X_i \in S_1} X_i $, $ \mu _2 = \frac{1}{|S_2|} \sum _{X_i \in S_2} X_i $ (2-2)

$ S_m $ is "mixture scatter matrix"

$ S_m= \sum _{i=1} ^d ||X_i - \mu||^2 $, $ \mu = \frac{1}{d} \sum _{i=1} ^{d} X_i $ (2-3)

Try to find $ S_1 $ and $ S_2 $ that minimize J.

Exhaustive search procedure

Examnple with 6 pattern $ X_1 , X_2 , \cdots , X_6 $

List all partition of 6 points into 2 sets.

<<Someting>>


Evaluate J for each partition

Cansider other J's

$ J=ln|S_m ^{-1} S_w| $ (2-4)

$ J=tr (S_m) - \mu (tr S_w -c) $ (2-5)

C is fixed constant, $ \mu $ : Largrange multiplier

$ J=\frac{tr(S_m)}{tr(S_w)} $ (2-6)

To speed up search "use iterative procedure"

Pick a partition at random

$ S_1={X_1, X_2, X_3}, S_2= {X_4, X_5, X_6} $ (2-7)

Compute J

Consider effect of moving

$ X_1 $ into $ S_2 \Rightarrow \Delta J_{12} $,

$ X_2 $ into $ S_2 \Rightarrow \Delta J_{22} $,

$ X_3 $ into $ S_3 \Rightarrow \Delta J_{33} $,

$ X_4 $ into $ S_1 \Rightarrow \Delta J_{41} $,

$ X_5 $ into $ S_1 \Rightarrow \Delta J_{51} $,

$ X_6 $ into $ S_1 \Rightarrow \Delta J_{61} $

Apply (Simultaneously) all the moves for which $ \Delta J $ is negative, repeat procedure

OR

Apply the move for which $ \Delta J $ is the most negative repeat procedure

Convergence?

If convergence, global minimum? No idea

If c>2, can use similar procedure

If c is unknown, try c=2,3,4, etc (hierarchical clustering)

Look at evolution of J for c increase (similarity scale)

<<Picture>>

An important J

"Square error criterion"

<<eqn>>

  • Good when clusters are compact, well separated
  • Sensitive to outliers

<<eqn>> is more robust to outliers

Can use other types of similarity measure

<<eqn>>

to speed up optimization of J

Use "Nearest mean reclassification rule"

  • choose initial partition $ S_1, S_2 , \cdots , S_c $
  • calculate $ \mu _1 , \mu _2 , \codts , \mu _c $
  • reclsssify each $ X_i $ to the class of the nearest mean
  • If cluster have changed, repeat

Note: $ X_i $ class of its nearest mean

is same as choosing the move for $ X_i $ that moves $ \Delta J $ as negative as possible because <<eqn>>

If <<eqn>>, <<eqn>>


<<eqn>> is as negative as possible when <<eqn>>

Can use FORGY, CLUSTER

Observation

<<eqn>>

This is a distance based clustering method.

No need for feature vector

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett