Latest revision as of 08:38, 17 January 2013

Lecture 17, ECE662: Decision Theory

Lecture notes for ECE662 Spring 2008, Prof. Boutin.

Other lectures: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,

Nearest Neighbor Classification Rule

useful when there are several labels
e.g. fingerprint-based recognition

Problem: Given the labeled training samples: $\vec{X_1}, \vec{X_2}, \ldots, \vec{X_d}$ $\in \mathbb{R}^n$ (or some other feature space) and an unlabeled test point $\vec{X_0}$ $\in \mathbb{R}^n$ .

Classification: Let $\vec{X_i}$ be the closest training point to $\vec{X_0}$ , then we assign the class of $\vec{X_i}$ to $\vec{X_0}$ .

What do we mean by closest?

There are many meaning depending on the metric we choose for the feature space.

Definition A "metric" on a space S is a function

$D: S\times S\rightarrow \mathbb{R}$

that satisfies the following 4 properties:

Non-negativity $D(\vec{x_1},\vec{x_2})\geq 0, \forall \vec{x_1},\vec{x_2}\in S$
Symmetry $D(\vec{x_1},\vec{x_2})=D(\vec{x_2},\vec{x_1}), \forall \vec{x_1},\vec{x_2}\in S$
Reflexivity $D(\vec{x},\vec{x})=0, \forall \vec{x}\in S$
Triangle Inequality $D(\vec{x_1},\vec{x_2})+D(\vec{x_2},\vec{x_3})\geq D(\vec{x_1},\vec{x_3}) , \forall \vec{x_1}, \vec{x_2}, \vec{x_3}\in S$

Illustration of 3 different metrics

Examples of metrics

Euclidean distance: $D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_2}=\sqrt{\sum_{i=1}^n ({x_1}^i-{x_2}^i)^2}$

Manhattan (cab driver) distance: $D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_1}=\sum_{i=1}^n |{x_1}^i-{x_2}^i|$

Minkowski metric: $D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_p}=(\sum_{i=1}^n ({x_1}^i-{x_2}^i)^p)^{\frac{1}{p}}$

Riemannian metric: $D(\vec{x_1},\vec{x_2})=\sqrt{(\vec{x_1}-\vec{x_2})^\top \mathbb{M}(\vec{x_1}-\vec{x_2})}$

Infinite norm: $D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{\infty}=max_i |{x_1}^i-{x_2}^i|$

where M is a symmetric positive definite $n\times n$ matrix. Different choices for M enable associating different weights with different components.

In this way, we see that $\mathbb{R}^n$ , $\mathbb{Z}^n$ , $\mathbb{C}^n$ have many natural metrics, but feature could be in some other set, e.g. a discrete set.

for example,

$$ x_1 $$ ={fever, skinrash, high blodd pressure}

$$ x_2 $$ ={fever, neckstiffness}

Tanimoto metric

$D(set1, set2) = \frac {|set1|+|set2|-2|set1 \bigcap set2| }{|set1|+|set2|-|set1 \bigcap set2|}$

Example: previous approach to shape recognition Given is a set of ordered points in $R_n =(p_1,p_2,\cdots,p_N)$ We want to recognize the shape

Figure 1

Given template (triangle form): (T1,T2,...,TN); We want to assign one of test template to a test (P1,P2,P3) In this case, we should not use Euclidean distance!,

Figure 2

becasue shape defined by point is unchanged (invariant) by rotation and translation of triangles.

Therefore, distance between 2 triangles (or shapes) must be independent on the position and orientation of triangles.

Procrustes metric

$D(p,\bar p)= \sum_{\begin{matrix}i=1 \\ rotation R, translation T \end{matrix}}^n {\begin{Vmatrix} Rp_i+T-\bar p_i \end{Vmatrix}} _{L^2}$

$p=(p_1, p_2, \cdots ,p_N),\bar p = (\bar p_1, \bar p_2, \cdots ,\bar p_N)$

Alternative approach "Use invariant coordinate to repeat $p=(p_1, p_2, \cdots ,p_N)$ "

i.e find $\varphi$ such that

$\varphi : \mathbb{R}^n\rightarrow \mathbb{R}^k$ (where, typically $k \leq n$ )

s.t $\varphi (x) = \varphi (\bar x)$

whenever $\exists$ R, T with $R \bar X + \bar T = X$

Example of phi with triangle (Figure 3):

(p1,p2,p3) -> (new p1, new p2, new p3)

Back to ECE662, Spring 2008, Prof. Boutin

@@ Line 1: / Line 1: @@
-[http://balthier.ecn.purdue.edu/index.php/ECE662 ECE662 Main Page]
+=Lecture 17, [[ECE662]]: Decision Theory=
+Lecture notes for [[ECE662:BoutinSpring08_Old_Kiwi|ECE662 Spring 2008]], Prof. [[user:mboutin|Boutin]].
-[http://balthier.ecn.purdue.edu/index.php/ECE662#Class_Lecture_Notes Class Lecture Notes]
+Other lectures: [[Lecture 1 - Introduction_Old Kiwi|1]],
+[[Lecture 2 - Decision Hypersurfaces_Old Kiwi|2]],
+[[Lecture 3 - Bayes classification_Old Kiwi|3]],
+[[Lecture 4 - Bayes Classification_Old Kiwi|4]],
+[[Lecture 5 - Discriminant Functions_Old Kiwi|5]],
+[[Lecture 6 - Discriminant Functions_Old Kiwi|6]],
+[[Lecture 7 - MLE and BPE_Old Kiwi|7]],
+[[Lecture 8 - MLE, BPE and Linear Discriminant Functions_Old Kiwi|8]],
+[[Lecture 9 - Linear Discriminant Functions_Old Kiwi|9]],
+[[Lecture 10 - Batch Perceptron and Fisher Linear Discriminant_Old Kiwi|10]],
+[[Lecture 11 - Fischer's Linear Discriminant again_Old Kiwi|11]],
+[[Lecture 12 - Support Vector Machine and Quadratic Optimization Problem_Old Kiwi|12]],
+[[Lecture 13 - Kernel function for SVMs and ANNs introduction_Old Kiwi|13]],
+[[Lecture 14 - ANNs, Non-parametric Density Estimation (Parzen Window)_Old Kiwi|14]],
+[[Lecture 15 - Parzen Window Method_Old Kiwi|15]],
+[[Lecture 16 - Parzen Window Method and K-nearest Neighbor Density Estimate_Old Kiwi|16]],
+[[Lecture 17 - Nearest Neighbors Clarification Rule and Metrics_Old Kiwi|17]],
+[[Lecture 18 - Nearest Neighbors Clarification Rule and Metrics(Continued)_Old Kiwi|18]],
+[[Lecture 19 - Nearest Neighbor Error Rates_Old Kiwi|19]],
+[[Lecture 20 - Density Estimation using Series Expansion and Decision Trees_Old Kiwi|20]],
+[[Lecture 21 - Decision Trees(Continued)_Old Kiwi|21]],
+[[Lecture 22 - Decision Trees and Clustering_Old Kiwi|22]],
+[[Lecture 23 - Spanning Trees_Old Kiwi|23]],
+[[Lecture 24 - Clustering and Hierarchical Clustering_Old Kiwi|24]],
+[[Lecture 25 - Clustering Algorithms_Old Kiwi|25]],
+[[Lecture 26 - Statistical Clustering Methods_Old Kiwi|26]],
+[[Lecture 27 - Clustering by finding valleys of densities_Old Kiwi|27]],
+[[Lecture 28 - Final lecture_Old Kiwi|28]],
+----
+----
+'''Nearest Neighbor Classification Rule'''
+* useful when there are several labels
+* e.g. fingerprint-based recognition
-**Nearest Neighbor Classification Rule**
+''Problem'': Given the labeled training samples: <math>\vec{X_1}, \vec{X_2}, \ldots, \vec{X_d}</math> <math>\in \mathbb{R}^n</math> (or some other feature space) and an unlabeled test point <math>\vec{X_0}</math> <math>\in \mathbb{R}^n</math>.
-- useful when there are several labels
-- e.g. fingerprint-based recognition
-.. |X_space_d| image:: tex
+''Classification'': Let <math>\vec{X_i}</math> be the closest training point to <math>\vec{X_0}</math>, then we assign the class of <math>\vec{X_i}</math> to <math>\vec{X_0}</math>.
-:alt: tex: \vec{X_1}, \vec{X_2}, \ldots, \vec{X_d}
-.. |inRn| image:: tex
-:alt: tex: \in \mathbb{R}^n
-.. |X_0| image:: tex
+'''What do we mean by closest?'''
-:alt: tex: \vec{X_0}
+There are many meaning depending on the metric we choose for the feature space.
-.. |X| image:: tex
-:alt: tex: \vec{X}
-Problem: Given the labeled training samples: |X_space_d| |inRn| (or some other feature space) and an unlabeled test point |X_0| |inRn|.
+'''Definition''' A "metric" on a space S is a function
-Classification: Let |X| be the closest training point to |X_0|, then we assign the class of |X| to |X_0|.
+<math>D: S\times S\rightarrow \mathbb{R}</math>
-**What do we mean by closest?**
+that satisfies the following 4 properties:
+* Non-negativity <math>D(\vec{x_1},\vec{x_2})\geq 0, \forall \vec{x_1},\vec{x_2}\in S</math>
+* Symmetry <math>D(\vec{x_1},\vec{x_2})=D(\vec{x_2},\vec{x_1}), \forall \vec{x_1},\vec{x_2}\in S</math>
+* Reflexivity <math>D(\vec{x},\vec{x})=0, \forall \vec{x}\in S</math>
+* Triangle Inequality <math>D(\vec{x_1},\vec{x_2})+D(\vec{x_2},\vec{x_3})\geq D(\vec{x_1},\vec{x_3}) , \forall \vec{x_1}, \vec{x_2}, \vec{x_3}\in S</math>
-.. |D_from_SxS_to_R| image:: tex
+[[Image:distances_Old Kiwi.jpg]]
-:alt: tex: D: S\times S\rightarrow \mathbb{R}
+'''Illustration of 3 different metrics'''
-There are many meaning depending on the metric we choose for the feature space.
-**Definition** A "metric" on a space S is a function
+'''Examples of metrics'''
-|D_from_SxS_to_R|
+Euclidean distance: <math>D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_2}=\sqrt{\sum_{i=1}^n ({x_1}^i-{x_2}^i)^2}</math>
-.. |nonnegat| image:: tex
+Manhattan (cab driver) distance: <math>D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_1}=\sum_{i=1}^n |{x_1}^i-{x_2}^i|</math>
-:alt: tex: D(\vec{x_1},\vec{x_2})\geq 0, \forall \vec{x_1},\vec{x_2}\in S
+Minkowski metric: <math>D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_p}=(\sum_{i=1}^n ({x_1}^i-{x_2}^i)^p)^{\frac{1}{p}}</math>
-.. |symmetry| image:: tex
+Riemannian metric: <math>D(\vec{x_1},\vec{x_2})=\sqrt{(\vec{x_1}-\vec{x_2})^\top \mathbb{M}(\vec{x_1}-\vec{x_2})}</math>
-:alt: tex: D(\vec{x_1},\vec{x_2})=D(\vec{x_2},\vec{x_1}), \forall \vec{x_1},\vec{x_2}\in S
+Infinite norm: <math>D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{\infty}=max_i |{x_1}^i-{x_2}^i|</math>
-.. |reflex| image:: tex
-:alt: tex: D(\vec{x},\vec{x})=0, \forall \vec{x}\in S
+where M is a symmetric positive definite <math>n\times n</math> matrix. Different choices for M enable associating different weights with different components.
-.. |tri-ineq| image:: tex
+In this way, we see that <math> \mathbb{R}^n</math>, <math>\mathbb{Z}^n</math>, <math>\mathbb{C}^n</math> have many natural metrics, but feature could be in some other set, e.g. a discrete set.
-:alt: tex: D(\vec{x_1},\vec{x_2})+D(\vec{x_2},\vec{x_3})\geq D(\vec{x_1},\vec{x_3}) , \forall \vec{x_1}, \vec{x_2}, \vec{x_3}\in S
-that satisfies the following 4 properties:
+for example,
-- Non-negativity |nonnegat|
-- Symmetry |symmetry|
+<math>x_1</math>={fever, skinrash, high blodd pressure}
-- Reflexivity |reflex|
-- Triangle Inequality |tri-ineq|
+<math>x_2</math>={fever, neckstiffness}
+Tanimoto metric
+<math>D(set1, set2) = \frac {|set1|+|set2|-2|set1 \bigcap set2| }{|set1|+|set2|-|set1 \bigcap set2|} </math>
+Example: previous approach to shape recognition
+Given is a set of ordered points in <math>R_n =(p_1,p_2,\cdots,p_N)</math>
+We want to recognize the shape
-.. |euclid| image:: tex
+[[Image:Lec17_fig1_Old Kiwi.JPG]]
-:alt: tex: D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_2}=\sqrt{\sum_{i=1}^n ({x_1}^i-{x_2}^i)^2}
+Figure 1
-.. |manhattan| image:: tex
+Given template (triangle form): (T1,T2,...,TN);
-:alt: tex: D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_1}=\sum_{i=1}^n |{x_1}^i-{x_2}^i|
+We want to assign one of test template to a test (P1,P2,P3)
+In this case, we should not use Euclidean distance!,
-.. |minkowski| image:: tex
+[[Image:Lec17_fig2_Old Kiwi.JPG]]
-:alt: tex: D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_p}=(\sum_{i=1}^n ({x_1}^i-{x_2}^i)^p)^{\frac{1}{p}}
+Figure 2
-.. |riemannian| image:: tex
+becasue shape defined by point is unchanged (invariant) by rotation and translation of triangles.
-:alt: tex: D(\vec{x_1},\vec{x_2})=\sqrt{(\vec{x_1}-\vec{x_2})^\top \mathbb{M}(\vec{x_1}-\vec{x_2})}
-**Examples of metrics**
+Therefore, distance between 2 triangles (or shapes) must be independent on the position and orientation of triangles.
-Euclidean distance: |euclid|
+'''Procrustes metric'''
-Manhattan distance: |manhattan|
+<math>D(p,\bar p)=  \sum_{\begin{matrix}i=1 \\ rotation R, translation T \end{matrix}}^n
+{\begin{Vmatrix} Rp_i+T-\bar p_i \end{Vmatrix}} _{L^2} </math>
-Minkowski metric: |minkowski|
+<math>p=(p_1, p_2, \cdots ,p_N),\bar p = (\bar p_1, \bar p_2, \cdots ,\bar p_N)</math>
-Riemannian metric: |riemannian|
+Alternative approach
+"Use invariant coordinate to repeat <math>p=(p_1, p_2, \cdots ,p_N) </math> "
-.. |nxn| image:: tex
+i.e find <math> \varphi </math> such that
-:alt: tex: n\times n
-where M is a symmetric positive definite |nxn| matrix. Different choices for M enable associating different weights with different components.
-.. |Rn| image:: tex
+<math> \varphi : \mathbb{R}^n\rightarrow \mathbb{R}^k</math>
-:alt: tex: \mathbb{R}^n
+(where, typically <math> k \leq n </math>)
-.. |Zn| image:: tex
+s.t <math> \varphi  (x) = \varphi (\bar x) </math>
-:alt: tex: \mathbb{Z}^n
-.. |Cn| image:: tex
+whenever <math> \exists </math> R, T with <math> R \bar X + \bar T = X</math>
-:alt: tex: \mathbb{C}^n
-In this way, we see that |Rn|, |Zn|, |Cn| have many natural metrics, but feature could be in some other set, e.g. a discrete set.
+Example of phi with triangle (Figure 3):
+[[Image:lec17_rot_tri_Old Kiwi.png]]
-test
+(p1,p2,p3) -> (new p1, new p2, new p3)
+----
+[[ECE662:BoutinSpring08_Old_Kiwi|Back to ECE662, Spring 2008, Prof. Boutin]]
+[[Category:Lecture Notes]]

Difference between revisions of "Lecture 17 - Nearest Neighbors Clarification Rule and Metrics Old Kiwi" - Rhea

Latest revision as of 08:38, 17 January 2013

Lecture 17, ECE662: Decision Theory

Alumni Liaison