Difference between revisions of "K-means OldKiwi" - Rhea

Latest revision as of 08:13, 24 March 2008

Introduction

The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function.

$V=\sum_{i=1}^{k}\sum_{xj\epsilon Sj}^{}{(xj-\mu j)}^{2}$

So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. That is the means are the a good representation of the way the data is distributed in the space.

Usually when we implement the K-means algorithm we use a iterative procedure.

1) Randomly initialize a bunch of means.

2) For every data vector see which mean it is "closest" to and update this fact.

3) Using the information about each data vector being close toa particular mean for each mean and its associated data recompute the new mean and update.

4) Check whether the average distance is less thansome threshold. If yes we are done ! Else go back to step 1.

Software

Efficient k-Means Implementation (follow link under Tools_OldKiwi)

References

An Efficient k-Means Clustering Algorithm: Analysis and Implementation, T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman and, A. Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 24, No. 7, July 2002.

@@ Line 1: / Line 1: @@
+== Introduction ==
 The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function.
+<math>V=\sum_{i=1}^{k}\sum_{xj\epsilon Sj}^{}{(xj-\mu j)}^{2}</math>
-.. image:: kmeans.png
-.. image:: kmeans2.png
 So basically it is trying to choose clusters so that the average distance between the data vectors and the chosen means (basically the model parameter) is a minimum. That is the means are the a good representation of the way the data is distributed in the space.
@@ Line 20: / Line 17: @@
 ) Check whether the average distance is less thansome threshold. If yes we are done ! Else go back to step 1.
+== Software ==
+* Efficient k-Means Implementation (follow link under [[Tools_OldKiwi]])
+== References ==
+* An Efficient k-Means Clustering Algorithm: Analysis and Implementation, T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman and, A. Wu, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 24, No. 7, July 2002.

Difference between revisions of "K-means OldKiwi" - Rhea

Latest revision as of 08:13, 24 March 2008

Introduction

Software

References

Alumni Liaison