Line 55: Line 55:
 
'''Discriminant Functions'''
 
'''Discriminant Functions'''
  
      Discriminant functions are used to find the minimum probability of error in decision making problems. In a problem with feature vector '''y''' and state of nature variable ''x'', we can represent the discriminant function as:
+
      Discriminant functions are used to find the minimum probability of error in decision making problems. In a problem with feature vector '''y''' and state of nature variable ''w'', we can represent the discriminant function as:
  
 
<div style="margin-left: 25em;">
 
<div style="margin-left: 25em;">
<math>g_i(\mathbf{y}) = \ln p(\mathbf{y}|x_i) + \ln P(x_i)  </math>  
+
<math>g_i(\mathbf{Y}) = \ln p(\mathbf{Y}|w_i) + \ln P(w_i)  </math>  
 
</div>
 
</div>
  
where from previous essays we defined
+
where from [[Bayesian Decision Theory - Continuous Features|previous essays]] we defined p('''Y'''|''w<sub>i</sub>'') as the conditional probability density function for '''Y''' with ''w<sub>i</sub>'' being the state of nature, and ''P''(''w<sub>j</sub>'') is the prior probability that nature is in state ''w<sub>j</sub>''. If we take p('''Y'''|''w<sub>i</sub>'') as multivariate normal distributions. That is if p('''Y'''|''w<sub>i</sub>'') = ''N('''&mu;''','''&sigma;''')''. Then the discriminant function changes to;
 +
 
 +
<div style="margin-left: 25em;">
 +
<math>g_i(\mathbf{Y}) = - \frac{1}{2} \left (\mathbf{x}  - \boldsymbol{\mu}_i \right)^t\boldsymbol{\Sigma}_i^{-1} \left (\mathbf{x}  - \boldsymbol{\mu}_i \right) - \frac{d}{2} \ln 2\pi - \frac{1}{2} \ln |\boldsymbol{\Sigma}_i| + \ln P(w_i) </math>
 +
</div>
 +
 
 +
 
 +
 
 
----
 
----
 
*[[Honors_project_1_ECE302S12|Back to Tosin's Honors Project]]
 
*[[Honors_project_1_ECE302S12|Back to Tosin's Honors Project]]

Revision as of 15:23, 5 April 2013

Discriminant Functions For The Normal Density


Introduction to Normal or Gaussian Distribution

      Before talking about discriminant functions for the normal density, we first need to know what a normal distribution is and how it is represented for just a single variable, and for a vector variable. Lets begin with the continuous univariate normal or Gaussian density.

$ f_x = \frac{1}{\sqrt{2 \pi} \sigma} \exp \left [- \frac{1}{2} \left ( \frac{x - \mu}{\sigma} \right)^2 \right ] $


for which the expected value of x is

$ \mu = \mathcal{E}[x] =\int\limits_{-\infty}^{\infty} xp(x)\, dx $

and where the expected squared deviation or variance is

$ \sigma^2 = \mathcal{E}[(x- \mu)^2] =\int\limits_{-\infty}^{\infty} (x- \mu)^2 p(x)\, dx $

       The univariate normal density is completely specified by two parameters; its mean μ and variance σ2. The function fx can be written as N(μ,σ) which says that x is distributed normally with mean μ and variance σ2. Samples from normal distributions tend to cluster about the mean with a spread related to the standard deviation σ.

For the multivariate normal density in d dimensions, fx is written as

$ f_x = \frac{1}{(2 \pi)^ \frac{d}{2} |\boldsymbol{\Sigma}|^\frac{1}{2}} \exp \left [- \frac{1}{2} (\mathbf{x} -\boldsymbol{\mu})^t\boldsymbol{\Sigma}^{-1} (\mathbf{x} -\boldsymbol{\mu}) \right] $

where x is a d-component column vector, μ is the d-component mean vector, Σ is the d-by-d covariance matrix, and |Σ| and Σ-1 are its determinant and inverse respectively. Also,(x - μ)t denotes the transpose of (x - μ).

and

$ \boldsymbol{\Sigma} = \mathcal{E} \left [(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t \right] = \int(\mathbf{x} - \boldsymbol{\mu})(\mathbf{x} - \boldsymbol{\mu})^t p(\mathbf{x})\, dx $

where the expected value of a vector or a matrix is found by taking the expected value of the individual components. i.e if xi is the ith component of x, μi the ith component of μ, and σij the ijth component of Σ, then

$ \mu_i = \mathcal{E}[x_i] $

and

$ \sigma_{ij} = \mathcal{E}[(x_i - \mu_i)(x_j - \mu_j)] $

The covariance matrix Σ is always symmetric and positive definite which means that the determinant of Σ is strictly positive. The diagonal elements σii are the variances of the respective xi ( i.e., σ2), and the off-diagonal elements σij are the covariances of xi and xj. If xi and xj are statistically independent, then σij = 0. If all off-diagonanl elements are zero, p(x) reduces to the product of the univariate normal densities for the components of x.


Discriminant Functions

      Discriminant functions are used to find the minimum probability of error in decision making problems. In a problem with feature vector y and state of nature variable w, we can represent the discriminant function as:

$ g_i(\mathbf{Y}) = \ln p(\mathbf{Y}|w_i) + \ln P(w_i) $

where from previous essays we defined p(Y|wi) as the conditional probability density function for Y with wi being the state of nature, and P(wj) is the prior probability that nature is in state wj. If we take p(Y|wi) as multivariate normal distributions. That is if p(Y|wi) = N(μ,σ). Then the discriminant function changes to;

$ g_i(\mathbf{Y}) = - \frac{1}{2} \left (\mathbf{x} - \boldsymbol{\mu}_i \right)^t\boldsymbol{\Sigma}_i^{-1} \left (\mathbf{x} - \boldsymbol{\mu}_i \right) - \frac{d}{2} \ln 2\pi - \frac{1}{2} \ln |\boldsymbol{\Sigma}_i| + \ln P(w_i) $



Alumni Liaison

Followed her dream after having raised her family.

Ruth Enoch, PhD Mathematics