Line 33: | Line 33: | ||
The motivation for choosing a non-parametric method for estimating a density function is largely derived from a lack of prior information about the density function that corresponds to an experimenter's data. Without a substantial amount of information about the distribution of data (and conditional distributions of data belonging to each class) it is near impossible to do parametric density estimation, namely maximum likelihood estimation (MLE) and Bayesian parameter estimation (BPE). Recall that for MLE, the estimated parameter vector <math>\hat{\theta}</math> corresponds to the value of '''<math>\theta</math>''' that maximizes the likelihood function, i.e.: | The motivation for choosing a non-parametric method for estimating a density function is largely derived from a lack of prior information about the density function that corresponds to an experimenter's data. Without a substantial amount of information about the distribution of data (and conditional distributions of data belonging to each class) it is near impossible to do parametric density estimation, namely maximum likelihood estimation (MLE) and Bayesian parameter estimation (BPE). Recall that for MLE, the estimated parameter vector <math>\hat{\theta}</math> corresponds to the value of '''<math>\theta</math>''' that maximizes the likelihood function, i.e.: | ||
+ | |||
+ | <center><math>\hat{\theta} = \underset{\theta}{\operatorname{argmax}}\prod\limits_{k=1}^{n}p(\vec{X}_k | \vec{\theta})</math></center> | ||
still working... | still working... |
Revision as of 09:19, 29 April 2014
Parzen Window Density Estimation
A slecture by ECE student Ben Foster
Partly based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.
still working...
Coverage
- Brief introduction to non-parametric dnesity estimation, specifically Parzen windowing
- Brief introduction to the thoery that Parzen windowing is based on
- Visualizations of Parzen windows and Parzen window-based classification
- Brief discussion of the advantages and disadvantages of the Parzen windowing method
Introduction/Motivation
So far in our study of pattern recognition and classification we have primarily focused on the use of discriminant functions as a means of classifying data. That is, for a set of classes $ \omega_c $, we choose to classify a sample X$ _i $ to a class c if that class is most probable given what we know about the sample.
The phrase ``most probable" implies that we are dealing with probability distributions defined in the normal way, which is correct. As one might guess, the probability distributions that are used to map samples to classes are not always of immediately obvious character and/or easy to obtain. Maximum likelihood estimation (MLE) and Bayesian parameter estimation are fairly broad categories of methodologies that attempt to estimate the parameters of the underlying distributions of data, and the expectation-maximization (EM) algorithm is an oft-used particular method of estimating these parameters.
However, not all density-estimation methods are dependent on having prior knowledge of the distributions of the data to be classified. ``Non-parametric" methods eschew assumptions about the distribution of data to varying degrees, thus circumventing some of the issues associated with more Bayesian methods. Though there are a number of non-parametric density-estimation methods that are widely employed, this lecture will focus on one of the most popular: Parzen window estimation. The following survey of the method will hopefully shed some light on the pros and cons of the Parzen window method individually as well as the advantages and disadvantages of non-parametric estimation in general. Additionally, a direct application of Parzen window estimation to a classification problem will be presented and discussed.
Parzen Windows
- Convergence to p(X) - Are we actually constructing a p.d.f.?
The motivation for choosing a non-parametric method for estimating a density function is largely derived from a lack of prior information about the density function that corresponds to an experimenter's data. Without a substantial amount of information about the distribution of data (and conditional distributions of data belonging to each class) it is near impossible to do parametric density estimation, namely maximum likelihood estimation (MLE) and Bayesian parameter estimation (BPE). Recall that for MLE, the estimated parameter vector $ \hat{\theta} $ corresponds to the value of $ \theta $ that maximizes the likelihood function, i.e.:
still working...