Line 1: | Line 1: | ||
− | The PCA, or Principal Component Analysis | + | The PCA, or Principal Component Analysis is used to find a lower dimensional space that best represents the data, placing the axes in the directions that the data varies most. |
+ | |||
The PCA diagonalizes the maximum likelihood estimate of the covariance matrix | The PCA diagonalizes the maximum likelihood estimate of the covariance matrix | ||
Line 8: | Line 9: | ||
<math>C\vec{e} = \lambda \vec{e}</math> | <math>C\vec{e} = \lambda \vec{e}</math> | ||
− | The solutions to these equations are eigenvalues <math>\lambda_1 \lambda_2 \cdots \lambda_m</math>. Often only <math>k m </math> eigenvalues will have a nonzero value, meaning that the inherent dimensionality of the data is <math>k</math>, being <math>n-k</math> dimensions noise. | + | The solutions to these equations are eigenvalues <math>\lambda_1 \lambda_2 \cdots \lambda_m</math>. Often only <math>k m </math> eigenvalues will have a nonzero value, meaning that the inherent dimensionality of the data is <math>k</math>, being <math>n-k</math> dimensions noise in the data. |
+ | |||
+ | In order to represent the data in the k dimensional space we first construct the matrix <math>E=[\vec{e_1} \vec{e_2} \cdots \vec{e_k}]</math>. The projection to the new k-dimensional subspace is done by the following linear transformation: | ||
− | + | \vec{x}^{'} = E^T\vec{x} |
Revision as of 00:11, 18 April 2008
The PCA, or Principal Component Analysis is used to find a lower dimensional space that best represents the data, placing the axes in the directions that the data varies most.
The PCA diagonalizes the maximum likelihood estimate of the covariance matrix
$ C=\frac{1}{n} \sum_{i=1}^{n} \vec{x_i}\vec{x_i}^T $
by solving the eigenvalue equation
$ C\vec{e} = \lambda \vec{e} $
The solutions to these equations are eigenvalues $ \lambda_1 \lambda_2 \cdots \lambda_m $. Often only $ k m $ eigenvalues will have a nonzero value, meaning that the inherent dimensionality of the data is $ k $, being $ n-k $ dimensions noise in the data.
In order to represent the data in the k dimensional space we first construct the matrix $ E=[\vec{e_1} \vec{e_2} \cdots \vec{e_k}] $. The projection to the new k-dimensional subspace is done by the following linear transformation:
\vec{x}^{'} = E^T\vec{x}