Discriminant Functions For The Normal Density - Part 1

Introduction to Normal or Gaussian Distribution

Before talking about discriminant functions for the normal density, we first need to know what a normal distribution is and how it is represented for just a single variable, and for a vector variable. Lets begin with the continuous univariate normal or Gaussian density.

$f_x = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left [- \frac{1}{2} \left ( \frac{x - \mu}{\sigma} \right)^2 \right ]$

for which the expected value of x is

$\mu = \mathcal{E}[x] =\int\limits_{-\infty}^{\infty} xp(x)\, dx$

and where the expected squared deviation or variance is

$\sigma^2 = \mathcal{E}[(x- \mu)^2] =\int\limits_{-\infty}^{\infty} (x- \mu)^2 p(x)\, dx$

The univariate normal density is completely specified by two parameters; its mean μ and variance σ². The function f_x can be written as N(μ,σ) which says that x is distributed normally with mean μ and variance σ². Samples from normal distributions tend to cluster about the mean with a spread related to the standard deviation σ.

For the multivariate normal density in d dimensions, f_x is written as

$f_x = \frac{1}{(2 \pi)^ \frac{d}{2} |\boldsymbol{\Sigma}|^\frac{1}{2}} \exp \left [- \frac{1}{2} (\mathbf{x} -\boldsymbol{\mu})^t\boldsymbol{\Sigma}^{-1} (\mathbf{x} -\boldsymbol{\mu}) \right]$

where x is a d-component column vector, μ is the d-component mean vector, Σ is the d-by-d covariance matrix, and |Σ| and Σ^-1 are its determinant and inverse respectively. Also,(x - μ)^t denotes the transpose of (x - μ).

and

$\boldsymbol{\Sigma} = \mathcal{E} \left [(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t \right] = \int(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t p(\mathbf{x})\, dx$

where the expected value of a vector or a matrix is found by taking the expected value of the individual components. i.e if x_i is the ith component of x, μ_i the ith component of μ, and σ_ij the ijth component of Σ, then

$\mu_i = \mathcal{E}[x_i]$

and

$\sigma_{ij} = \mathcal{E}[(x_i - \mu_i)(x_j - \mu_j)]$

The covariance matrix Σ is always symmetric and positive definite which means that the determinant of Σ is strictly positive. The diagonal elements σ_ii are the variances of the respective x_i ( i.e., σ²), and the off-diagonal elements σ_ij are the covariances of x_i and x_j. If x_i and x_j are statistically independent, then σ_ij = 0. If all off-diagonanl elements are zero, p(x) reduces to the product of the univariate normal densities for the components of x.

Discriminant Functions

Discriminant functions are used to find the minimum probability of error in decision making problems. In a problem with feature vector y and state of nature variable w, we can represent the discriminant function as:

$g_i(\mathbf{Y}) = \ln p(\mathbf{Y}|w_i) + \ln P(w_i)$

where from previous essays we defined p(Y|w_i) as the conditional probability density function for Y with w_i being the state of nature, and P(w_j) is the prior probability that nature is in state w_j. If we take p(Y|w_i) as multivariate normal distributions. That is if p(Y|w_i) = N(μ,σ). Then the discriminant function changes to;

$g_i(\mathbf{Y}) = - \frac{||\mathbf{x} - \boldsymbol{\mu}_i||^2}{\boldsymbol{\sigma}_i } + \ln P(w_i)$ ,

where ||.|| denotes the Euclidean norm, that is,

Next week, we will look more in depth into discriminant functions for the normal density, looking at the special cases of the covariance.

Discriminant Functions For The Normal(Gaussian) Density - Rhea

Discriminant Functions For The Normal Density - Part 1

Alumni Liaison