Line 20: | Line 20: | ||
− | '''2. Upper Bounds for Bayes Error''' | + | =='''2. Upper Bounds for Bayes Error'''== |
'''2.1 Classifier using Bayes rule''' | '''2.1 Classifier using Bayes rule''' | ||
Line 27: | Line 27: | ||
<center> | <center> | ||
− | <math> \arg\max_{\omega_{i} \in \big\{\omega_{1}, \cdots ,\omega_{c}\big\} } Prob \big( \omega_{i} \mid x \big). \text{......Eq.(1)} | + | <math> \arg\max_{\omega_{i} \in \big\{\omega_{1}, \cdots ,\omega_{c}\big\} } Prob \big( \omega_{i} \mid x \big). \text{......Eq.(2.1)} |
</math></center> | </math></center> | ||
− | However, it is difficult to directly estimate <math> Prob \big( \omega_{i} \mid x \big) </math>. So instead of solving eq. . | + | However, it is difficult to directly estimate <math> Prob \big( \omega_{i} \mid x \big) </math>. So instead of solving eq.(2.1), we use the Bayes rule to change the problem to |
<center><math> | <center><math> | ||
− | \arg\max_{\omega_{i} \in \big\{\omega_{1}, \cdots ,\omega_{c}\big\} } \rho \big(x \mid \omega _{i}\big) Prob \big( \omega _{i}\big). \text{......Eq.(2)} | + | \arg\max_{\omega_{i} \in \big\{\omega_{1}, \cdots ,\omega_{c}\big\} } \rho \big(x \mid \omega _{i}\big) Prob \big( \omega _{i}\big). \text{......Eq.(2.2)} |
</math></center> | </math></center> | ||
− | As can be seen from eq. . | + | As can be seen from eq.(2.2), we need to know all the distributions and priors of the classes in the dataset to apply the Bayesian classifier. If the distributions and priors in the dataset is all known, the Bayesian classifier is an optimal classifier since the decision taken following Bayes rule minimizes the probability of error. |
Line 45: | Line 45: | ||
The expected value of the error when deciding following the Bayes rule can be computed as below. | The expected value of the error when deciding following the Bayes rule can be computed as below. | ||
− | <math> E \big(error\big) = \int_\Re Prob \big(error \mid x\big) \rho \big(x\big) dx</math> | + | <center> |
+ | <math> E \big(error\big) = \int_\Re Prob \big(error \mid x\big) \rho \big(x\big) dx \text{......Eq.(2.3)} | ||
+ | </math> | ||
− | <math>= \int_\Re min \big( Prob \big( \omega _{1} \mid x\big), Prob \big( \omega _{2} \mid x\big) \big) \rho \big(x\big) dx </math> | + | <math>= \int_\Re min \big( Prob \big( \omega _{1} \mid x\big), Prob \big( \omega _{2} \mid x\big) \big) \rho \big(x\big) dx \text{......Eq.(2.4)} |
+ | </math> | ||
− | <math>= \int_\Re min \big( \rho \big( x \mid \omega _{1}\big) Prob \big(\omega _{1}\big) , \rho \big( x \mid \omega _{2}\big) Prob \big(\omega _{2}\big) \big) dx</math> | + | <math>= \int_\Re min \big( \rho \big( x \mid \omega _{1}\big) Prob \big(\omega _{1}\big) , \rho \big( x \mid \omega _{2}\big) Prob \big(\omega _{2}\big) \big) dx \text{......Eq.(2.5)} |
+ | </math> | ||
+ | </center> | ||
− | And we can obtain the upper bound for eq. . | + | And we can obtain the upper bound for eq.(2.5) using the following lemma. |
− | <math>min \big\{a, b\big\} \leq a^ \beta b^{1-\beta}</math> | + | <center> |
+ | <math>min \big\{a, b\big\} \leq a^ \beta b^{1-\beta} \text{......Eq.(2.6)} | ||
+ | </math></center> | ||
− | where <math>\forall a,b \geq 0</math> and <math>0 \leq \beta \leq 1</math>. Therefore, from eq. . | + | where <math>\forall a,b \geq 0</math> and <math>0 \leq \beta \leq 1</math>. Therefore, from eq.(2.5) and (2.6), we can derive the following error bound. |
− | <math>E \big(error\big) \leq \int_\Re \big( \rho \big(x \mid \omega _{1}\big) Prob \big(\omega_{1}\big) \big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) Prob \big(\omega_{2}\big) \big) ^{1- \beta } dx = \varepsilon _ \beta </math> | + | <center> |
+ | <math>E \big(error\big) \leq \int_\Re \big( \rho \big(x \mid \omega _{1}\big) Prob \big(\omega_{1}\big) \big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) Prob \big(\omega_{2}\big) \big) ^{1- \beta } dx = \varepsilon _ \beta \text{......Eq.(2.7)}</math> | ||
+ | </center> | ||
− | Eq. . | + | Eq.(2.7) is called Chernoff Bound. This is the error bound that can be computed given the full knowledge of probability of each class. |
Since priors are independent of ''x'', we can take priors out of the integral. | Since priors are independent of ''x'', we can take priors out of the integral. | ||
− | <math>\varepsilon _ \beta = Prob \big(\omega_{1}\big) ^\beta Prob \big(\omega_{2}\big) ^{1-\beta} \int_\Re \rho \big(x \mid \omega _{1}\big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) \big) ^{1- \beta } dx</math> | + | <center> |
+ | <math>\varepsilon _ \beta = Prob \big(\omega_{1}\big) ^\beta Prob \big(\omega_{2}\big) ^{1-\beta} \int_\Re \rho \big(x \mid \omega _{1}\big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) \big) ^{1- \beta } dx \text{......Eq.(2.8)}</math> | ||
+ | </center> | ||
For the specific case where <math>\beta=\frac{1}{2}</math>, we call <math>\varepsilon_{\frac{1}{2}}</math> as Bhattacharyya bound. | For the specific case where <math>\beta=\frac{1}{2}</math>, we call <math>\varepsilon_{\frac{1}{2}}</math> as Bhattacharyya bound. | ||
Line 69: | Line 80: | ||
The smallest of such bound in eq. ... | The smallest of such bound in eq. ... | ||
− | <math>S := \min_{\beta \in \left[ 0,1 \right] } \big( \varepsilon _{\beta}\big) </math> | + | <center> |
+ | <math>S := \min_{\beta \in \left[ 0,1 \right] } \big( \varepsilon _{\beta}\big) \text{......Eq.(2.9)} | ||
+ | </math> | ||
+ | </center> | ||
is also sometimes called the Chernoff Bound. This can be used as a error bound since the Bayes error lies somewhere in between 0 and <math>\min\big( \varepsilon _{\beta}\big) </math>. | is also sometimes called the Chernoff Bound. This can be used as a error bound since the Bayes error lies somewhere in between 0 and <math>\min\big( \varepsilon _{\beta}\big) </math>. | ||
Line 77: | Line 91: | ||
− | '''3. Chernoff Bound for Normally Distributed Data''' | + | =='''3. Chernoff Bound for Normally Distributed Data'''== |
'''3.1 ''' | '''3.1 ''' | ||
Line 83: | Line 97: | ||
'''3.2 ''' | '''3.2 ''' | ||
− | '''Summary''' | + | =='''Summary'''== |
− | '''Reference''' | + | =='''Reference'''== |
Revision as of 18:48, 14 April 2014
Upper Bounds for Bayes Error
(partially based on Prof. Mireille Boutin's ECE 662 lecture)
Contents
1. Introduction
This slecture describes the theoretical upper bounds for Bayes Error. First, in chapter 2, the error bound is expressed in terms of the Bayes classifiers. This error bound expression includes a min function that by using a lemma, the min function can be replaced with the expression of a theoretical error bound. In chapter 3, we specifically derive the Chernoff bound for the Normally distributed data. We also derive the Chernoff distance in the case of Normally distributed data. In section 3.2, some examples for the Chernoff bound are provided.
The materials given in this lecture are based on the lecture notes and the discussions that were shared in Prof. Boutin's ECE662 Spring 2014 course at Purdue University. Examples were designed to reflect the theories taught in the class.
2. Upper Bounds for Bayes Error
2.1 Classifier using Bayes rule
To classify the dataset of feature vectors with C labels, we choose class $ w_{i} $ where
However, it is difficult to directly estimate $ Prob \big( \omega_{i} \mid x \big) $. So instead of solving eq.(2.1), we use the Bayes rule to change the problem to
As can be seen from eq.(2.2), we need to know all the distributions and priors of the classes in the dataset to apply the Bayesian classifier. If the distributions and priors in the dataset is all known, the Bayesian classifier is an optimal classifier since the decision taken following Bayes rule minimizes the probability of error.
2.2 Upper Bounds for Bayes Error
The expected value of the error when deciding following the Bayes rule can be computed as below.
$ E \big(error\big) = \int_\Re Prob \big(error \mid x\big) \rho \big(x\big) dx \text{......Eq.(2.3)} $
$ = \int_\Re min \big( Prob \big( \omega _{1} \mid x\big), Prob \big( \omega _{2} \mid x\big) \big) \rho \big(x\big) dx \text{......Eq.(2.4)} $
$ = \int_\Re min \big( \rho \big( x \mid \omega _{1}\big) Prob \big(\omega _{1}\big) , \rho \big( x \mid \omega _{2}\big) Prob \big(\omega _{2}\big) \big) dx \text{......Eq.(2.5)} $
And we can obtain the upper bound for eq.(2.5) using the following lemma.
where $ \forall a,b \geq 0 $ and $ 0 \leq \beta \leq 1 $. Therefore, from eq.(2.5) and (2.6), we can derive the following error bound.
$ E \big(error\big) \leq \int_\Re \big( \rho \big(x \mid \omega _{1}\big) Prob \big(\omega_{1}\big) \big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) Prob \big(\omega_{2}\big) \big) ^{1- \beta } dx = \varepsilon _ \beta \text{......Eq.(2.7)} $
Eq.(2.7) is called Chernoff Bound. This is the error bound that can be computed given the full knowledge of probability of each class.
Since priors are independent of x, we can take priors out of the integral.
$ \varepsilon _ \beta = Prob \big(\omega_{1}\big) ^\beta Prob \big(\omega_{2}\big) ^{1-\beta} \int_\Re \rho \big(x \mid \omega _{1}\big) ^ \beta \big( \rho \big(x \mid \omega _{2}\big) \big) ^{1- \beta } dx \text{......Eq.(2.8)} $
For the specific case where $ \beta=\frac{1}{2} $, we call $ \varepsilon_{\frac{1}{2}} $ as Bhattacharyya bound.
The smallest of such bound in eq. ...
$ S := \min_{\beta \in \left[ 0,1 \right] } \big( \varepsilon _{\beta}\big) \text{......Eq.(2.9)} $
is also sometimes called the Chernoff Bound. This can be used as a error bound since the Bayes error lies somewhere in between 0 and $ \min\big( \varepsilon _{\beta}\big) $.
The Chernoff bound $ \varepsilon _{\beta} $ is the probability of error as a function of $ \beta $. It is usually used for two-category cases and if to be used for multi-category cases, the error bound can be split into several two-category cases and add up all to form the final error bound.
3. Chernoff Bound for Normally Distributed Data
3.1
3.2