(New page: == Introduction == This approach consist in using certain models for clusters and attempting to optimize the fit between the data and the model. In practice, each cluster can be mathemati...)
 
 
Line 13: Line 13:
 
So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. Tesentation of the way the data is distributed in the space.
 
So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. Tesentation of the way the data is distributed in the space.
  
3) Using EM algorithm we maximise the likelihood function.
+
3) Using [[Expectation-Maximization_Old Kiwi]] algorithm we maximise the likelihood function.

Latest revision as of 15:22, 6 April 2008

Introduction

This approach consist in using certain models for clusters and attempting to optimize the fit between the data and the model. In practice, each cluster can be mathematically represented by a parametric distribution, like a Gaussian. Thus, the entire data set is modelled by a mixture of these distributions.

Mixture of Gaussians

This is the most widely used clustering method of this kind is the one based on learning a mixture of Gaussians. The algorithms works in this way:

1) it chosses the component (the Gaussian) at random with probability $ P(w_i) $.

2) it samples a point $ N(m_i,std^2I) $ So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. Tesentation of the way the data is distributed in the space.

3) Using Expectation-Maximization_Old Kiwi algorithm we maximise the likelihood function.

Alumni Liaison

Have a piece of advice for Purdue students? Share it through Rhea!

Alumni Liaison