Introduction to local (nonparametric) density estimation methods

A slecture by Yu Liu

(partially based on Prof. Mireille Boutin's ECE662 lecture)

1. Introduction

This slecture introduces two local density estimation methods which are Parzen density estimation and k-nearest neighbor density estimation. Local density estimation is also referred to as non-parametric density estimation. To make things clear, let’s first look at parametric density estimation. In parametric density estimation, we can assume that there exists a density function which can be determined by a set of parameters. The set of parameters are estimated from the sample data and are later used in designing the classifier. However, in some practical situations the assumption that there exists a parametric form of the density function does not hold true. For example, it is very hard to fit a multimodal probability distribution with a simple function. In this case, we need to estimate the density function in the nonparametric way, which means that the density function is estimated locally based on a small set of neighboring samples. Because of this locality, local (nonparametric) density estimation is less accurate than parametric density estimation. In the following text the word “local” is preferred over “nonparametric.”

It is noteworthy that it is very difficult to obtain an accurate local density estimation, especially when the dimension of the feature space is high. So why do we bother using local density estimation? This is because our goal is not to get an accurate estimation, but rather to use the estimation to design a well performed classifier. The inaccuracy of local density estimation does not necessarily lead to a poor decision rule.

2. General Principle

In local density estimation the density function p_n(x) can be approximated by

where v_n is the volume of a small region R around point x, n is the total number of samples x_i (i =1, 2…, n) drawn according to p_n(x), and k_n is the number of x_i’s which fall into region R. The reason why p_n(x) can be calculated in this way is that p_n(x) does not vary much within a relatively small region, thus the probability mass of region R can be approximated by p_n(x)v_n, which equals k_n/n.

Introduction to local density estimation methods - Rhea

Introduction to local (nonparametric) density estimation methods

1. Introduction

2. General Principle

Alumni Liaison