MAP Estimation by Landis


The Big Picture

Given observation X used to estimate an unknown parameter $ \theta $ of distribution $ f_x(X) $ (i.e. $ f_x(X) = $ some function $ g(\theta) $

Consider three expressions (distributions):

1. Likehood

$ p(X; \theta) $ (discrete)

$ f_x(X; \theta) $ (continuous)

used for MLE: $ \overset{\land}\theta_{ML} = f_x(X | \theta) $

2. Prior

$ P(\theta) $ (discrete)

$ P_\theta(\theta) $ (continuous)

Indicates some prior knowledge as to what $ \theta $ should be. Prior refers to before seeing observation.

3. Posterior

$ p(\theta | x) $ (discrete)

$ f_x(\theta, x) $ (continuous)

"Posterior" refers to after seeing observations. Use Posterior to define maximum a-posterior i (map) estimate:

$ \overset{\land}\theta_{\mbox{MAP}} = \overset{\mbox{argmax}}\theta f_{\theta | X}(\theta | X) $

Using Bayes' Rule, we can expand the posterior $ f_{\theta | X}(\theta | X) $:

$ f_{\theta | X}(\theta | X) = \frac{f_{x|\theta}f_\theta(\theta)}{f_X(X)} $

$ \overset{\land}\theta_{\mbox{map}} = \overset{\mbox{argmax}}\theta f_{X | \theta}(X | \theta) F_\theta(\theta) $


So What?

So, what does this mean in a nutshell? Essentially, an ML estimator is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have no clue what this parameter should be.) I run some experiment to get a sample output from the random variable. Based on my experiment, I would like to find the pattern for the random variable that best matches my data. So, I will find the pattern that gives my data the most likely chance of occuring."

A MAP estimator is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have some idea what this parameter should be, so I will treat the parameter as a random variable. Then, the chance that the parameter will have some value will be expressed by it's PDF/PMF.) I run some experiment to get a sample output from the random variable. Based on my experiment and on my prior ideas of what the parameter should be, I would like to find the pattern that best matches both my data and my prior knowledge. So, I will find the pattern that maximizes the product of two things: how well the parameter matches the data, and how well the parameter meets my expectations of what the parameter should be."


Example 1: Continuous

$ X \sim f_x(X) = \lambda e^{-\lambda X} $

but we don't know the parameter $ \lambda $. Let us assume, however, that $ \lambda $ is actually itself exponentially distributed, i.e.

$ \lambda \sim f_\lambda(\lambda) = \Lambda e^{-\Lambda\lambda} $

where $ \Lambda $ is fixed and known.

Find $ \overset{\land}\lambda_{\mbox{map}} $.

Solution:

$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_{\lambda | X}(\lambda | X) $

$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_x(\lambda)f_{x|\lambda}(x; \lambda) $

$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda \Lambda e^{-\lambda \Lambda}\lambda e^{-\lambda X} $

$ \frac{d}{d\lambda} \lambda \Lambda e^{-\lambda(\Lambda + X)} = 0 $

$ \Lambda e^{-\lambda}(\Lambda + X) - \lambda \Lambda (\Lambda + X) e^{-\lambda(\lambda + X)} = 0 $

$ 1 - \lambda(\Lambda + X) = 0 $

$ \overset{\land}X_{\mbox{map}} = \frac{1}{\Lambda + X} $

Recall from homework: $ \overset{\land}X_{\mbox{ML}} = \frac{1}{X} $

Prior $ f_{\lambda}(\lambda) = \Lambda e^{-\lambda \Lambda} $

Example 2: Discrete


Back to ECE302 Fall 2008 Prof. Sanghavi

Alumni Liaison

Prof. Math. Ohio State and Associate Dean
Outstanding Alumnus Purdue Math 2008

Jeff McNeal