(New page: The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries t...)
 
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 +
== Introduction ==
 +
 
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function.
 
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function.
  
  
 
+
<math>V=\sum_{i=1}^{k}\sum_{xj\epsilon Sj}^{}{(xj-\mu j)}^{2}</math>
.. image:: kmeans.png
+
 
+
 
+
 
+
.. image:: kmeans2.png
+
  
 
So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. That is the means are the a good representation of the way the data is distributed in the space.
 
So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. That is the means are the a good representation of the way the data is distributed in the space.
Line 20: Line 17:
  
 
4) Check whether the average distance is less thansome threshold. If yes we are done ! Else go back to step 1.
 
4) Check whether the average distance is less thansome threshold. If yes we are done ! Else go back to step 1.
 +
 +
== Software ==
 +
 +
* Efficient k-Means Implementation (follow link under [[Tools_OldKiwi]])
 +
 +
== References ==
 +
 +
* An Efficient k-Means Clustering Algorithm: Analysis and Implementation, T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman and, A. Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 24, No. 7, July 2002.

Latest revision as of 08:13, 24 March 2008

Introduction

The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function.


$ V=\sum_{i=1}^{k}\sum_{xj\epsilon Sj}^{}{(xj-\mu j)}^{2} $

So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. That is the means are the a good representation of the way the data is distributed in the space.

Usually when we implement the K-means algorithm we use a iterative procedure.

1) Randomly initialize a bunch of means.

2) For every data vector see which mean it is "closest" to and update this fact.

3) Using the information about each data vector being close toa particular mean for each mean and its associated data recompute the new mean and update.

4) Check whether the average distance is less thansome threshold. If yes we are done ! Else go back to step 1.

Software

  • Efficient k-Means Implementation (follow link under Tools_OldKiwi)

References

  • An Efficient k-Means Clustering Algorithm: Analysis and Implementation, T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman and, A. Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 24, No. 7, July 2002.

Alumni Liaison

has a message for current ECE438 students.

Sean Hu, ECE PhD 2009