Revision as of 19:29, 30 April 2014 by Jmanring (Talk | contribs)


From KNN to Nearest Neighbor Classification

A slecture by ECE student Jonathan Manring

Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.



Post your slecture material here. Guidelines:

  • If you are making a text slecture
    • Type text using wikitext markup languages
    • Type all equations using latex code between <math> </math> tags.
    • You may include links to other Project Rhea pages.


I. Introduction
The K-nearest neighbor (KNN) and nearest neighbor (NN) approaches to data classification are closely related, with the nearest neighbor approach largely being a simplification of KNN. In this tutorial, we will explain first the concept of KNN, secondly the nearest neighbor approach, and thirdly discuss briefly the comparative advantages and disadvantages of each. The focus of this tutorial will be the concepts more than the rigorous mathematical definitions, though this will be touched on as well.

II. The K-Nearest-Neighbor (KNN) Method

The K-nearest neighbors (KNN) approach is one of the so-called non-parametric density estimation methods since it does not assume any particular “parametric” density form. A better category might be local density estimation, since KNN and other methods in this class estimate the density function locally.

The KNN method begins with a set of labeled training data (points) in the feature space. Note that it is essential that the training data reflect the prior probabilities of the possible classes! New points are classified by expanding a window around them until a pre-determined number of training data points, k, are contained within the window. Let Vk(xo) be the volume of the window around unclassified point xo containing k neighbors, and let N be the number of total data points in the training set. Then k (xo) = k - 1NVk(xo). It turns out that this is in fact an unbiased estimate of (xo)!

E[k(xo)] = (xo)

Thus, increasing k, the number of nearest neighbors, will not cause KNN to become more accurate. It is perfectly unbiased already. Increasing k has the effect of reducing the variance of the estimate of (xo), but the expectation of k(xo) is always accurate.

III. Classification with KNN

KNN leads to a very easy and intuitive classification method. The class with the majority (or plurality, as the case may be) of points in the set of k neighbors becomes the class of the point in question. For example, if k is 5 and 3 of the neighbors belong to class 1 and 2 of them belong to class 2, the point xo will be assigned to class 1, based on the majority vote.



IV. The Nearest Neighbor method

The nearest neighbor approach is similar to the KNN approach in that it relies on labeled training data. It is again necessary that the training data accurately reflect the prior probabilities of each class. The class of the “nearest” training point is taken as the class of the point xo. The “nearest” neighbor can be found using a distance metric, D(x1, x2), that must satisfy the following conditions:

1)
$ D(\vec{x_1},\vec{x_2}) \geq 0 $
2)
$ D(\vec{x_1},\vec{x_2}) = D(\vec{x_2},\vec{x_1}) $
3)
$ D(\vec{x},\vec{x}) = 0 $
4)
$ D(\vec{x_1},\vec{x_2}) + D(\vec{x_2},\vec{x_3}) \geq D(\vec{x_1},\vec{x_3}) $


D(x1, x2) 0
2) D(x1, x2) = D(D(x2, x1)
3) D(x, x) = 0
4) D(x1, x2) + D(x2, x3) D(x1, x3)

The last property, known as the triangle inequality, is often the most constrictive. Yet it allows for large data sets to be indexed such that large sections of the data need not even be considered when searching for the nearest neighbor of a new point.

Below are several example metrics for distance:



Euclidean distance:
$ D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_2}=\sqrt{\sum_{i=1}^n ({x_1}^i-{x_2}^i)^2} $

Manhattan (cab driver) distance:
$ D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_1}=\sum_{i=1}^n |{x_1}^i-{x_2}^i| $

Minkowski metric:
$ D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{L_p}=(\sum_{i=1}^n ({x_1}^i-{x_2}^i)^p)^{\frac{1}{p}} $

Riemannian metric:
$ D(\vec{x_1},\vec{x_2})=\sqrt{(\vec{x_1}-\vec{x_2})^\top \mathbb{M}(\vec{x_1}-\vec{x_2})} $

Infinite norm:
$ D(\vec{x_1},\vec{x_2})=||\vec{x_1}-\vec{x_2}||_{\infty}=max_i |{x_1}^i-{x_2}^i| $





(create a question page and put a link below)

Questions and comments

If you have any questions, comments, etc. please post them on this page.


Back to ECE662, Spring 2014

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett