Introduction
In order to visualize high dimensional data, one needs a way to represent high dimensional data points in some lower-dimensional space. The MDS or Multidimensional Scaling finds configurations of points projected in low dimensional spaces (typically 2D or 3D) that best preserves the distances between those points in the original high dimensional space.
MDS
Consider a set of n points $ \{x_1,x_2, \cdots , x_n\} \belongsto R^n $, the input space. We want to find the set of projections the input points in a subspace R^m with m << n in such a way that the distances between points are best preserved. Let's define $ d_{ij}=\|y_i-y_j\| $ and $ \delta_{ij} = \|x_i-x_j\| $.
The goal is to find y_i's such that the choice of one of the following cost functions is minimized.
Since the gradient of the functions are easy to compute, one can apply the method of gradient descent to minimize the cost function and find the best configuration of points y_i.
Metric MDS
For the particular case when we use Euclidean distances to measure $ \delta_{ij} $, we can deploy a method very similar to PCA to find the lower-dimensional projection of points that best represent the distances among them.