Introduction
Visualization of high dimensional data is important in several subareas of pattern recognition including the verification of cluster validity. In order to visualize high dimensional data, one needs a way to represent high dimensional data points in some lower-dimensional space. The MDS or Multidimensional Scaling finds configurations of points projected in low dimensional spaces (typically 2D or 3D) that best preserves the distances between those points in the original high dimensional space. In this article we describe the general aspects of the MDS with focus on the case when dissimilarities between points can be expressed as distances in a metric space, or Metric MDS.
MDS
Consider a set of n points $ \{x_1,x_2, \cdots , x_n\} \in R^n $, the input space. We want to find the set of projections the input points in a subspace R^m with m \ll n in such a way that the distances between points are best preserved. Let's define $ d_{ij}=\|y_i-y_j\| $ and $ \delta_{ij} = \|x_i-x_j\| $.
The goal is to find y_i's such that a cost functions is minimized such as the cost function below:
$ J_{ee} = \frac{\Sum_{i<j}(d_{ij}-\delta_{ij})^2}{\Sum_{i<j}\delta_{ij}^2} $
Other alternatives of cost functions for the MDS can be found in the DHS book 2nd Ed, page 573. Since the gradient of the function is easy to compute, one can apply the method of gradient descent to minimize the cost function and find the best configuration of points y_i.
Metric MDS
For the particular case when we use Euclidean distances to measure $ \delta_{ij} $, we can deploy a method very similar to PCA to find the lower-dimensional projection of points that best represent the distances among them based on eigen-analysis of the cross-product matrix S, which is computed from the distance matrix $ D=[\delta_{ij}] $ as follows:
S= -\frac{1}{2}HDH
where H is the $ n \times n $ centering matrix [1]. S is guaranteed to be positive semi-definite. We can now apply the eigenvalue decomposition to S:
$ S= E \Lambda E^T $
$ \Lambda = diag(\{\lambda_1, \lambda_2, \cdots, \lambda_n\}) $, and $ E=[\vec{e_1}\vec{e_2} \cdots \vec{e_n}] $, where \vec{e_i} is the eigenvector corresponding to the eigenvalue $ \lambda_i $, and $ \lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n $.
We now can compute the low-dimensional representations $ y_i $'s in a subspace of m dimensions by making
$ y_ij = \sqrt{\lambda_j}e_ij $, where $ i=1,\cdots, n $ and $ j=1, \cdots, m $, where $ m \ll n $.
Non-Metric MDS
The method described above can't be used if the dissimilarity measure between points is not a distance. Instead, the cost function J has to be minimized using a different approach. Several methods to handle the non-metric case have been proposed including one by Kruskal (1978). The reader may refer to the references below to more information about the non-metric MDS.
References
- Metric Multidimensional Scaling (MDS): Analyzing Distance Matrices, Herve Abdi. [[2]]
- Applied Multivariate Statistical Analysis, Wolfgang Härdle, Léopold Simar[[3]]
- ODL: Mathematical Psychology - Multidimensional Scaling http://www.mathpsyc.uni-bonn.de/doc/delbeke/delbeke.htm