Revision as of 23:36, 20 April 2008 by Lbachega (Talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

In order to visualize high dimensional data, one needs a way to represent high dimensional data points in some lower-dimensional space. The MDS or Multidimensional Scaling finds configurations of points projected in low dimensional spaces (typically 2D or 3D) that best preserves the distances between those points in the original high dimensional space.

MDS

Consider a set of n points $ \{x_1,x_2, \cdots , x_n\} \belongsto R^n $, the input space. We want to find the set of projections the input points in a subspace R^m with m << n in such a way that the distances between points are best preserved. Let's define $ d_{ij}=\|y_i-y_j\| $ and $ \delta_{ij} = \|x_i-x_j\| $.

The goal is to find y_i's such that the choice of one of the following cost functions is minimized.


Since the gradient of the functions are easy to compute, one can apply the method of gradient descent to minimize the cost function and find the best configuration of points y_i.

Metric MDS

For the particular case when we use Euclidean distances to measure $ \delta_{ij} $, we can deploy a method very similar to PCA to find the lower-dimensional projection of points that best represent the distances among them.

Alumni Liaison

Have a piece of advice for Purdue students? Share it through Rhea!

Alumni Liaison