Line 15: | Line 15: | ||
- For all patterns <math>X_i</math>, denote by <math>N_r(X_i)</math>, the neighborhood of size <math>r</math> of <math>X_i</math>. Estimate derivative of density p(x) along direction <math>X_i-X_j \forall X_j\inN_r(X_i)</math> | - For all patterns <math>X_i</math>, denote by <math>N_r(X_i)</math>, the neighborhood of size <math>r</math> of <math>X_i</math>. Estimate derivative of density p(x) along direction <math>X_i-X_j \forall X_j\inN_r(X_i)</math> | ||
− | + | <math>S_i(j)=\frac{p(X_i)-p(X_j)}{||X_i-X_j||_{L_2}}</math> | |
− | + | Let <math>j_{max}=argmax_j S_i(j)</math> | |
− | + | If <math>S_i(j_{max})<0</math> declare <math>i</math> to be a root node. | |
− | + | If <math>S_i(j_{max})>0</math> declare <math>X_{j_{max}}</math> to be parent of <math>X_i</math>. | |
− | + | If <math>S_i(j_{max})=0</math> it depends on the surroundings what link we establish. | |
- If <math>X_i</math> is connected to a root node, then make this root node a parent of <math>X_i</math>, else make <math>X_i</math> a root node. | - If <math>X_i</math> is connected to a root node, then make this root node a parent of <math>X_i</math>, else make <math>X_i</math> a root node. |
Revision as of 10:58, 22 April 2008
Clustering by finding valleys of densities
-- Approach 1: "Bump Hunting" --
- Approximate p(x) - Find peaks of density - Expand cluster boundaries outward until they meet (hopefully in middle!) - Define meeting points = Cluster boundary points
-- Approach 2: "Valley Seeking" --
- Start from valley regions of p(x) - Work uphill to connect data points to density peaks
Graph based implementation
- Estimate p(x) - For all patterns $ X_i $, denote by $ N_r(X_i) $, the neighborhood of size $ r $ of $ X_i $. Estimate derivative of density p(x) along direction $ X_i-X_j \forall X_j\inN_r(X_i) $
$ S_i(j)=\frac{p(X_i)-p(X_j)}{||X_i-X_j||_{L_2}} $
Let $ j_{max}=argmax_j S_i(j) $
If $ S_i(j_{max})<0 $ declare $ i $ to be a root node.
If $ S_i(j_{max})>0 $ declare $ X_{j_{max}} $ to be parent of $ X_i $.
If $ S_i(j_{max})=0 $ it depends on the surroundings what link we establish.
- If $ X_i $ is connected to a root node, then make this root node a parent of $ X_i $, else make $ X_i $ a root node.
- Repeat
When all points are visited, a forest is constructed. Each connected component (=tree of a forest) is a cluster.
- Problem: All methods presented above encounter a serious problem because of the following reasons:
- Density may oscillate a lot - Valleys are not well defined.
- Someone please insert figure here illustrating the above mentioned problem, as drawn in class.
PDE based valley seeking
PDE: Partial Differential Equation
PDE's can be used to minimize energy functionals
- Simple Example
Consider 1-D curves
$ u(x): [0,1] \rightarrow \Re $ with fixed end points $ u(0)=a $, $ u(1)=b $ (3-1)
<<Picture>>
Suppose Energy of curve is $ E(u)=\int _0 ^1 F(u, u')dx $ for some function $ F: \Re ^2 \rightarrow \Re $
e.g.) $ F(u,u')=|u'|^2 $ (3-2)
The curve that minimizes (or maximizes) E(u) satisfies Euler Equation
$ \frac{\partial F} {\partial u} -\frac{d}{dx}(\frac{\partial F}{\partial u'})=0 $ (3-3)
sometimes written as $ E'=0 \Rightarrow \frac{\partial E}{\partial u}=0 $ (3-4)
Similarly if $ E=\int _0 ^1 F(u,u',u'')dx $ (3-5)
e.g.) $ F(u,u',u'')=|u''|^2 $ (3-6)
Then Euler equation is $ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u'}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u''})=0 $ (3-7)
Similarly, for surface in $ \Re ^2 $
$ u(x,y): [0,1] \times [0,1] \rightarrow \Re $ (3-8)
Suppose energy is given by
$ E(u)=\int _{surface} F(u,u_x, u_y, u_{xx}, u_{xy},u_{yy})dxdy $ (3-9)
e.g.) $ F={u_x}^2+{u_y}^2 $ (3-10)
Then Euler equation is
$ \frac{\partial F}{\partial u} - \frac{d}{dx}(\frac{\partial F}{\partial u_x}) - \frac{d}{dy}(\frac{\partial F}{\partial u_y}) + \frac{d^2}{dx^2}(\frac{\partial F}{\partial u_{xx}}) + \frac{d^2}{dy^2}(\frac{\partial F}{\partial u_{yy}}) + \frac{d^2}{dxdy}(\frac{\partial F}{\partial u_{xy}})=0 $ (3-11)