(Types of Clustering Methods)
(Types of Clustering Methods)
Line 4: Line 4:
 
The partitioning method divides a data set into M different clusters, where each object belongs to one of these clusters. Each cluster has a representative element, or centroid, which describes all the objects contained in that cluster. In some cases, the arithmetic mean of the attribute vectors of all the objects in the cluster is a good representative.
 
The partitioning method divides a data set into M different clusters, where each object belongs to one of these clusters. Each cluster has a representative element, or centroid, which describes all the objects contained in that cluster. In some cases, the arithmetic mean of the attribute vectors of all the objects in the cluster is a good representative.
  
 +
<br /><br />
 
Method:
 
Method:
 
This method determines the clusters of a data set in one pass, and runs in O(NlogN). The disadvantage of this method is that it is highly dependent on the order in which the objects are processed.  
 
This method determines the clusters of a data set in one pass, and runs in O(NlogN). The disadvantage of this method is that it is highly dependent on the order in which the objects are processed.  
Line 22: Line 23:
 
III. If at the end, some objects have not been clustered, repeat Step II. <br />
 
III. If at the end, some objects have not been clustered, repeat Step II. <br />
 
</ol>
 
</ol>
 +
 +
<br /><br />
  
 
== Various Clustering Algorithms  ==
 
== Various Clustering Algorithms  ==
  
 
'''Description:''' This is a link to the Speech and Image Processing group at Joensuu Science Park in Finland. It includes links to many programs and pseudo codes which are intended for clustering data sets with an undetermined number of clusters. These programs two problems commonly faced in clustering applications: determining the number of clusters in the data set, and finding the location of each of these clusters. Since clustering is considered an NP-complete problem, ideal solutions have exponential running time. For this reason, the algorithms presented by this group are sub-optimal, but run in polynomial time. [http://cs.joensuu.fi/sipu/]
 
'''Description:''' This is a link to the Speech and Image Processing group at Joensuu Science Park in Finland. It includes links to many programs and pseudo codes which are intended for clustering data sets with an undetermined number of clusters. These programs two problems commonly faced in clustering applications: determining the number of clusters in the data set, and finding the location of each of these clusters. Since clustering is considered an NP-complete problem, ideal solutions have exponential running time. For this reason, the algorithms presented by this group are sub-optimal, but run in polynomial time. [http://cs.joensuu.fi/sipu/]

Revision as of 14:08, 16 April 2008

Types of Clustering Methods

  1. Partitioning Methods The partitioning method divides a data set into M different clusters, where each object belongs to one of these clusters. Each cluster has a representative element, or centroid, which describes all the objects contained in that cluster. In some cases, the arithmetic mean of the attribute vectors of all the objects in the cluster is a good representative.

    Method: This method determines the clusters of a data set in one pass, and runs in O(NlogN). The disadvantage of this method is that it is highly dependent on the order in which the objects are processed.

    I. Set the 1st object as the centroid for the 1st cluster.
    II. Calculate S (similarity coefficient) for each object with each existing centroid.
    III. If the calculated S for a specific object is higher than a given threshold, add the object to the corresponding cluster, and recalculate the centroid for this cluster. If the S value is lower than the threshold, create a new cluster. If at the end some objects have not been clustered, repeat Step II.

  2. Hierarchical Agglomerative Methods Method: This method determines clusters by adding the closest pair of objects to a single cluster. This algorithm run in N^3 time.

    I. Determine which 2 objects are closest to each other, and insert them into one cluster.
    II. Determine the next 2 closest objects. (Note: In this step, an object can be a single object or a cluster)
    III. If at the end, some objects have not been clustered, repeat Step II.



Various Clustering Algorithms

Description: This is a link to the Speech and Image Processing group at Joensuu Science Park in Finland. It includes links to many programs and pseudo codes which are intended for clustering data sets with an undetermined number of clusters. These programs two problems commonly faced in clustering applications: determining the number of clusters in the data set, and finding the location of each of these clusters. Since clustering is considered an NP-complete problem, ideal solutions have exponential running time. For this reason, the algorithms presented by this group are sub-optimal, but run in polynomial time. [1]

Alumni Liaison

To all math majors: "Mathematics is a wonderfully rich subject."

Dr. Paul Garrett