Discriminant Functions For The Normal Density - Part 1
Introduction to Normal or Gaussian Distribution
Before talking about discriminant functions for the normal density, we first need to know what a normal distribution is and how it is represented for just a single variable, and for a vector variable. Lets begin with the continuous univariate normal or Gaussian density.
$ f_x = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left [- \frac{1}{2} \left ( \frac{x - \mu}{\sigma} \right)^2 \right ] $
for which the expected value of x is
$ \mu = \mathcal{E}[x] =\int\limits_{-\infty}^{\infty} xp(x)\, dx $
and where the expected squared deviation or variance is
$ \sigma^2 = \mathcal{E}[(x- \mu)^2] =\int\limits_{-\infty}^{\infty} (x- \mu)^2 p(x)\, dx $
The univariate normal density is completely specified by two parameters; its mean μ and variance σ2. The function fx can be written as N(μ,σ) which says that x is distributed normally with mean μ and variance σ2. Samples from normal distributions tend to cluster about the mean with a spread related to the standard deviation σ.
For the multivariate normal density in d dimensions, fx is written as
$ f_x = \frac{1}{(2 \pi)^ \frac{d}{2} |\boldsymbol{\Sigma}|^\frac{1}{2}} \exp \left [- \frac{1}{2} (\mathbf{x} -\boldsymbol{\mu})^t\boldsymbol{\Sigma}^{-1} (\mathbf{x} -\boldsymbol{\mu}) \right] $
where x is a d-component column vector, μ is the d-component mean vector, Σ is the d-by-d covariance matrix, and |Σ| and Σ-1 are its determinant and inverse respectively. Also,(x - μ)t denotes the transpose of (x - μ).
and
$ \boldsymbol{\Sigma} = \mathcal{E} \left [(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t \right] = \int(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t p(\mathbf{x})\, dx $
where the expected value of a vector or a matrix is found by taking the expected value of the individual components. i.e if xi is the ith component of x, μi the ith component of μ, and σij the ijth component of Σ, then
$ \mu_i = \mathcal{E}[x_i] $
and
$ \sigma_{ij} = \mathcal{E}[(x_i - \mu_i)(x_j - \mu_j)] $
The covariance matrix Σ is always symmetric and positive definite which means that the determinant of Σ is strictly positive. The diagonal elements σii are the variances of the respective xi ( i.e., σ2), and the off-diagonal elements σij are the covariances of xi and xj. If xi and xj are statistically independent, then σij = 0. If all off-diagonanl elements are zero, p(x) reduces to the product of the univariate normal densities for the components of x.
Discriminant Functions
Discriminant functions are used to find the minimum probability of error in decision making problems. In a problem with feature vector y and state of nature variable w, we can represent the discriminant function as:
$ g_i(\mathbf{Y}) = \ln p(\mathbf{Y}|w_i) + \ln P(w_i) $
where from previous essays we defined p(Y|wi) as the conditional probability density function for Y with wi being the state of nature, and P(wj) is the prior probability that nature is in state wj. If we take p(Y|wi) as multivariate normal distributions. That is if p(Y|wi) = N(μ,σ). Then the discriminant function changes to;
$ g_i(\mathbf{Y}) = - \frac{||\mathbf{x} - \boldsymbol{\mu}_i||^2}{\boldsymbol{\sigma}_i } + \ln P(w_i) $,
where ||.|| denotes the Euclidean norm, that is,
Next week, we will look more in depth into discriminant functions for the normal density, looking at the special cases of the covariance.