Difference between revisions of "KnnDensityEstimation" - Rhea

Revision as of 18:48, 24 April 2014

K-Nearest Neighbors Density Estimation

A slecture by CIT student Raj Praveen Selvaraj

Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

Introduction

This slecture discusses about the K-Nearest Neighbors(k-NN) approach to estimate the density of a given distribution. The approach of K-Nearest Neighbors is very popular in signal and image processing for clustering and classification of patterns. It is an non-parametric density estimation technique which lets the region volume be a function of the training data. We will discuss the basic principle behind the k-NN approach to estimate density at a point X and then move on to building a classifier using the k-NN Density estimate.

Basic Principle

The general formulation for density estimation states that, for N Observations x₁,x₂,x₃,...,x_n the density at a point x can be approximated by the following function,

where V is the volume of some neighborhood(say A) around x and k denotes the number of observations that are contained within the neighborhood. The basic idea of k-NN is to extend the neighborhood, until the k nearest values are included. If we consider the neighborhood around x as a sphere, for the given N Observations, we pick an integer,

{an equation goes here}

If x_l is the k^th closest sample point to x, then h_k = ||x_l - x||

{equation of estimated density at x here}

We approximate the density p(x) by,
{equation here }

Most of the time this estimate is, {equation here}

How to classify data using k-NN Density Estimate

Having seen how the density at a given point x is estimated based on the value of k and the given observations x₁,x₂,x₃,...,x_n, let's discuss about using the k-NN density estimate for classification. </br>

Method 1:

Let x₀ from Rⁿ be a point to classify.

Given are samples x_i1,x_x2,..,x_xn for i classes.

We now pick a k_i for each class and a window function, and we try to approximate the density at x₀ for each class and then pick the class with the largest density based on,

{equation here}

If the priors of the classes are unknown, we use ROC curves to estimate the priors, based on,

{equation here}

Method 2: </br>

Given are samples x_i1,x_x2,..,x_xn from a Gaussian Mixture. We choose a single value of k and and one window function,

We then approximate p(x, w_i) by,
{equation here}</br>

where V_i is the volume of the smallest window that contains k samples and k_{i_{is the number of samples among these k that belongs to class i.}}

Post your slecture material here. Guidelines:

If you are making a text slecture
- Type text using wikitext markup languages
- Type all equations using latex code between <math> </math> tags.
- You may include links to other Project Rhea pages.

Questions and comments

If you have any questions, comments, etc. please post them on this page.

Back to ECE662, Spring 2014

@@ Line 42: / Line 42: @@
 Having seen how the density at a given point x is estimated based on the value of k and the given observations x<sub>1</sub>,x<sub>2</sub>,x<sub>3</sub>,...,x<sub>n</sub>, let's discuss about using the k-NN density estimate for classification. </br>
-<b>Method 1:<b> <br/>
+<b>Method 1:</b> <br/>
 Let x<sub>0</sub> from R<sup>n</sup> be a point to classify.
@@ Line 56: / Line 56: @@
 {equation here}
-<b>Method 2:<b> </br>
+<b>Method 2:</b> </br>
 Given are samples x<sub>i1</sub>,x<sub>x2</sub>,..,x<sub>xn</sub> from a Gaussian Mixture. We choose a single value of k and and one window function, <br/>