Line 35: | Line 35: | ||
---- | ---- | ||
− | Graphical Clustering Methods | + | '''Graphical Clustering Methods''' |
* "Similarity Graph Methods" | * "Similarity Graph Methods" | ||
− | Choose distance threshold | + | |
+ | Choose distance threshold <math>t_0</math> | ||
+ | |||
+ | If <math>dis(X_i,X_j)<t_0</math> draw an ege between <math>X_i</math> and <math>X_j</math> | ||
+ | |||
+ | |||
+ | |||
+ | Example) <math>t_0= 1.3</math> | ||
+ | |||
+ | <<Picture>> | ||
+ | |||
+ | Can define clusters s the connected component of the similarity graph | ||
+ | |||
+ | => Same result as "Single linkage algorithms" | ||
+ | |||
+ | <math>X_i ~ X_j</math> if there is chain | ||
+ | |||
+ | <math>X_i ~ X_{i_1} ~ X_{i_2} ~ \cdots X_{i_k} ~ X_{j}</math> complete | ||
+ | |||
+ | Can also define clusters as the maximal subgraphs of similarity graph | ||
+ | |||
+ | <<Picture>> | ||
+ | |||
+ | => More compact, less enlagated clusters | ||
+ | |||
+ | Not good for, say, | ||
+ | |||
+ | <<Picture>> |
Revision as of 10:04, 8 April 2008
Graph Theory Clustering
dataset $ \{x_1, x_2, \dots , x_d\} $ no feature vector given.
given $ dist(x_i , x_j) $
Construct a graph:
- node represents the objects.
- edges are relations between objects.
- edge weights represents distances.
Definitions:
- A complete graph is a graph with $ d(d-1)/2 $ edges.
- A subgraph $ G' $ of a graph $ G=(V,E,f) $ is a graph $ (V',E',f') $ such that $ V'\subset V $ $ E'\subset E $ $ f'\subset f $ restricted to $ E' $
- A path in a graph between $ V_i,V_k \subset V_k $ is an alternating sequence of vertices and edges containing no repeated edges and no repeated vertices and for which $ e_i $ is incident to $ V_i $ and $ V_{i+1} $, for each $ i=1,2,\dots,k-1 $. ($ V_1 e_1 V_2 e_2 V_3 \dots V_{k-1} e_{k-1} V_k $)
- A graph is "connected" if a path exists between any two vertices in the graph
- A component is a maximal connected graph. (i.e. includes as many nodes as possible)
- A maximal complete subgraph of a graph $ G $ is a complete subgraph of $ G $ that is not a proper subgraph of any other complete subgraph of $ G $.
- A cycle is a path of non-trivial length $ k $ that comes back to the node where it started
- A tree is a connected graph with no cycles. The weight of a tree is the sum of all edge weights in the tree.
- A spanning tree is a tree containing all vertices of a graph.
- A minimum spanning tree (MST) of a graph G is tree having minimal weight among all spanning trees of $ G $.
Graphical Clustering Methods
- "Similarity Graph Methods"
Choose distance threshold $ t_0 $
If $ dis(X_i,X_j)<t_0 $ draw an ege between $ X_i $ and $ X_j $
Example) $ t_0= 1.3 $
<<Picture>>
Can define clusters s the connected component of the similarity graph
=> Same result as "Single linkage algorithms"
$ X_i ~ X_j $ if there is chain
$ X_i ~ X_{i_1} ~ X_{i_2} ~ \cdots X_{i_k} ~ X_{j} $ complete
Can also define clusters as the maximal subgraphs of similarity graph
<<Picture>>
=> More compact, less enlagated clusters
Not good for, say,
<<Picture>>