(New page: Given observation X used to estimate an unknown parameter <math>\theta</math> of distribution <math>f_x(X)</math> (i.e. <math>f_x(X) = </math> some function <math>g(\theta)</math> Conside...) |
|||
(5 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:ECE302Fall2008_ProfSanghavi]] | ||
+ | [[Category:probabilities]] | ||
+ | [[Category:ECE302]] | ||
+ | [[Category:lecture notes]] | ||
+ | |||
+ | =MAP Estimation by Landis= | ||
+ | ----------------- | ||
+ | ==The Big Picture== | ||
+ | |||
Given observation X used to estimate an unknown parameter <math>\theta</math> of distribution <math>f_x(X)</math> | Given observation X used to estimate an unknown parameter <math>\theta</math> of distribution <math>f_x(X)</math> | ||
(i.e. <math>f_x(X) = </math> some function <math>g(\theta)</math> | (i.e. <math>f_x(X) = </math> some function <math>g(\theta)</math> | ||
Line 4: | Line 13: | ||
Consider three expressions (distributions): | Consider three expressions (distributions): | ||
− | 1. Likehood | + | ===1. Likehood=== |
<math>p(X; \theta)</math> (discrete) | <math>p(X; \theta)</math> (discrete) | ||
Line 12: | Line 21: | ||
used for MLE: <math>\overset{\land}\theta_{ML} = f_x(X | \theta)</math> | used for MLE: <math>\overset{\land}\theta_{ML} = f_x(X | \theta)</math> | ||
− | 2. Prior | + | ===2. Prior=== |
<math>P(\theta)</math> (discrete) | <math>P(\theta)</math> (discrete) | ||
Line 20: | Line 29: | ||
Indicates some prior knowledge as to what <math>\theta</math> should be. Prior refers to before seeing observation. | Indicates some prior knowledge as to what <math>\theta</math> should be. Prior refers to before seeing observation. | ||
− | 3. Posterior | + | ===3. Posterior=== |
<math>p(\theta | x)</math> (discrete) | <math>p(\theta | x)</math> (discrete) | ||
Line 29: | Line 38: | ||
<math>\overset{\land}\theta_{\mbox{MAP}} = \overset{\mbox{argmax}}\theta f_{\theta | X}(\theta | X)</math> | <math>\overset{\land}\theta_{\mbox{MAP}} = \overset{\mbox{argmax}}\theta f_{\theta | X}(\theta | X)</math> | ||
+ | |||
+ | Using Bayes' Rule, we can expand the posterior <math>f_{\theta | X}(\theta | X)</math>: | ||
+ | |||
+ | <math>f_{\theta | X}(\theta | X) = \frac{f_{x|\theta}f_\theta(\theta)}{f_X(X)}</math> | ||
+ | |||
+ | <math>\overset{\land}\theta_{\mbox{map}} = \overset{\mbox{argmax}}\theta f_{X | \theta}(X | \theta) F_\theta(\theta)</math> | ||
+ | |||
+ | |||
+ | ===So What?=== | ||
+ | |||
+ | So, what does this mean in a nutshell? Essentially, an <b>ML estimator</b> is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have <i>no clue</i> what this parameter should be.) I run some experiment to get a sample output from the random variable. Based on my experiment, I would like to find the pattern for the random variable that best matches my data. So, I will find the pattern that gives my data the <i>most likely</i> chance of occuring." | ||
+ | |||
+ | A <b>MAP estimator</b> is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have <i>some idea</i> what this parameter should be, so I will treat the parameter as a random variable. Then, the chance that the parameter will have some value will be expressed by it's PDF/PMF.) I run some experiment to get a sample output from the random variable. Based on my experiment <i>and</i> on my prior ideas of what the parameter should be, I would like to find the pattern that best matches both my data and my prior knowledge. So, I will find the pattern that maximizes the product of two things: how well the parameter matches the data, and how well the parameter meets my expectations of what the parameter should be." | ||
+ | |||
+ | |||
+ | == Example 1: Continuous == | ||
+ | |||
+ | <math>X \sim f_x(X) = \lambda e^{-\lambda X}</math> | ||
+ | |||
+ | but we don't know the parameter <math>\lambda</math>. Let us assume, however, that <math>\lambda</math> is actually itself exponentially distributed, i.e. | ||
+ | |||
+ | <math>\lambda \sim f_\lambda(\lambda) = \Lambda e^{-\Lambda\lambda}</math> | ||
+ | |||
+ | where <math>\Lambda</math> is fixed and known. | ||
+ | |||
+ | Find <math>\overset{\land}\lambda_{\mbox{map}}</math>. | ||
+ | |||
+ | Solution: | ||
+ | |||
+ | <math>\overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_{\lambda | X}(\lambda | X)</math> | ||
+ | |||
+ | <math>\overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_x(\lambda)f_{x|\lambda}(x; \lambda)</math> | ||
+ | |||
+ | <math>\overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda \Lambda e^{-\lambda \Lambda}\lambda e^{-\lambda X}</math> | ||
+ | |||
+ | <math>\frac{d}{d\lambda} \lambda \Lambda e^{-\lambda(\Lambda + X)} = 0</math> | ||
+ | |||
+ | <math>\Lambda e^{-\lambda}(\Lambda + X) - \lambda \Lambda (\Lambda + X) e^{-\lambda(\lambda + X)} = 0</math> | ||
+ | |||
+ | <math>1 - \lambda(\Lambda + X) = 0</math> | ||
+ | |||
+ | <math>\overset{\land}X_{\mbox{map}} = \frac{1}{\Lambda + X}</math> | ||
+ | |||
+ | Recall from homework: <math>\overset{\land}X_{\mbox{ML}} = \frac{1}{X}</math> | ||
+ | |||
+ | Prior <math>f_{\lambda}(\lambda) = \Lambda e^{-\lambda \Lambda}</math> | ||
+ | |||
+ | == Example 2: Discrete == | ||
+ | ---- | ||
+ | [[Main_Page_ECE302Fall2008sanghavi|Back to ECE302 Fall 2008 Prof. Sanghavi]] |
Latest revision as of 08:23, 10 May 2013
Contents
MAP Estimation by Landis
The Big Picture
Given observation X used to estimate an unknown parameter $ \theta $ of distribution $ f_x(X) $ (i.e. $ f_x(X) = $ some function $ g(\theta) $
Consider three expressions (distributions):
1. Likehood
$ p(X; \theta) $ (discrete)
$ f_x(X; \theta) $ (continuous)
used for MLE: $ \overset{\land}\theta_{ML} = f_x(X | \theta) $
2. Prior
$ P(\theta) $ (discrete)
$ P_\theta(\theta) $ (continuous)
Indicates some prior knowledge as to what $ \theta $ should be. Prior refers to before seeing observation.
3. Posterior
$ p(\theta | x) $ (discrete)
$ f_x(\theta, x) $ (continuous)
"Posterior" refers to after seeing observations. Use Posterior to define maximum a-posterior i (map) estimate:
$ \overset{\land}\theta_{\mbox{MAP}} = \overset{\mbox{argmax}}\theta f_{\theta | X}(\theta | X) $
Using Bayes' Rule, we can expand the posterior $ f_{\theta | X}(\theta | X) $:
$ f_{\theta | X}(\theta | X) = \frac{f_{x|\theta}f_\theta(\theta)}{f_X(X)} $
$ \overset{\land}\theta_{\mbox{map}} = \overset{\mbox{argmax}}\theta f_{X | \theta}(X | \theta) F_\theta(\theta) $
So What?
So, what does this mean in a nutshell? Essentially, an ML estimator is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have no clue what this parameter should be.) I run some experiment to get a sample output from the random variable. Based on my experiment, I would like to find the pattern for the random variable that best matches my data. So, I will find the pattern that gives my data the most likely chance of occuring."
A MAP estimator is as follows: "I know that a random variable follows some sort of pattern, and this pattern can be adjusted by changing a variable called a parameter. (I have some idea what this parameter should be, so I will treat the parameter as a random variable. Then, the chance that the parameter will have some value will be expressed by it's PDF/PMF.) I run some experiment to get a sample output from the random variable. Based on my experiment and on my prior ideas of what the parameter should be, I would like to find the pattern that best matches both my data and my prior knowledge. So, I will find the pattern that maximizes the product of two things: how well the parameter matches the data, and how well the parameter meets my expectations of what the parameter should be."
Example 1: Continuous
$ X \sim f_x(X) = \lambda e^{-\lambda X} $
but we don't know the parameter $ \lambda $. Let us assume, however, that $ \lambda $ is actually itself exponentially distributed, i.e.
$ \lambda \sim f_\lambda(\lambda) = \Lambda e^{-\Lambda\lambda} $
where $ \Lambda $ is fixed and known.
Find $ \overset{\land}\lambda_{\mbox{map}} $.
Solution:
$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_{\lambda | X}(\lambda | X) $
$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda f_x(\lambda)f_{x|\lambda}(x; \lambda) $
$ \overset{\land}\lambda_{\mbox{map}} = \overset{\mbox{argmax}}\lambda \Lambda e^{-\lambda \Lambda}\lambda e^{-\lambda X} $
$ \frac{d}{d\lambda} \lambda \Lambda e^{-\lambda(\Lambda + X)} = 0 $
$ \Lambda e^{-\lambda}(\Lambda + X) - \lambda \Lambda (\Lambda + X) e^{-\lambda(\lambda + X)} = 0 $
$ 1 - \lambda(\Lambda + X) = 0 $
$ \overset{\land}X_{\mbox{map}} = \frac{1}{\Lambda + X} $
Recall from homework: $ \overset{\land}X_{\mbox{ML}} = \frac{1}{X} $
Prior $ f_{\lambda}(\lambda) = \Lambda e^{-\lambda \Lambda} $