Bayes Decision Theory - Introduction
The Bayesian decision theory is a valuable approach to solve a pattern classification problem. It is based on quantifying the tradeoffs between various classification decisions using the probability of events occurring and the costs that accompany the decisions. Here, we are assuming that the problems are posed in probabilistic terms and all relevant probability values are known (It is important to note that in reality its not always like this).
Consider a situation where we have a stack of cards where each card is either a diamond or spade. We can denote x = x1 for diamonds, and x = x2 for spades. Suppose we want to design a system that will be able to predict the shape on the next card that comes up. We also know the prior probability P(x1) that the next card is diamonds, and some prior probability P(x2) that it is spades, and both probabilities sum up to 1 (since we only have two variables). We can therefore use the following decision rule; that if P(x1) > P(x2), then the card is diamonds, otherwise it is spades. How well that works will depend on how much greater P(x1) is. If it is much greater than P(x2) then our decision will favor diamonds most of the time, however if P(x1) = P(x2) then we have only a 50% chance of being correct.
In most cases however, we wont be making decisions with so little information. For example if we had information about the value of color of the shapes on the cards (the value of a color refers to the degree of lightness and darkness of a color), we can describe this as a variable y and we consider y to be a random variable whose distribution depends on the state of the card and is expressed as p(y|x). This is called the class-conditional probability density function, and it is defined as the probability of y given that the state is x. The equation for the conditional probability is given as:
$ P(y|x)= \frac{P(xy)}{P(x)} \qquad\qquad\qquad\qquad\qquad (1) $
and
$ P(x|y)= \frac{P(xy)}{P(y)} \qquad\qquad\qquad\qquad\qquad (2) $
The difference between p(y|x1) and p(y|x2) describes the difference in color values between the diamonds and spades in a stack of cards. Suppose we know both the prior probability P(xj) and the conditional probability p(y|xj) for j = 1,2. If we also measure the color values for the card as y, we can rearrange the equations 1 and 2 to come up with Bayes formula which is:
$ P(x_j|y)= \frac{p(y|x_j)p(x_j)}{P(y)} \qquad\qquad\qquad\qquad (3) $
the formula above can be expressed as
$ result = \frac{likelihood\ *\ prior}{evidence} $
Bayes formula shows that by knowing the value of y, we can get the probability of xj given that the feature value y has been measured. If we run an observation of y and we get P(x1|y) is greater than P(x2|y), we choose diamonds, and conversely if P(x2|y) is greater than P(x1|y) we choose spades. To justify this process, we can also calculate the probability of error when we make a decision. Whenever we observe a particular y, the probability of error is
$ P(error|y)= \begin{cases} P(x_1|y) \qquad if\ we\ decide\ x_2\\ P(x_2|y) \qquad if\ we\ decide\ x_1 \end{cases} $
Clearly for a given value of y, we can minimize erroe that P(error|y) is as small as possible. Where P(error|y) can be rewritten as:
$ P(error|y)= min[P(x_1|y),P(x_2|y)] $
In general both the prior probability and conditional probability given the value of an extra feature to improve our classifier are very important in making decisions, and Bayes theorem combines them to achieve the minimum probability of error in the decision making process.