(Translating equations...)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
`BPE - Bayesian Parameter Estimation from Lecture 7 <https://engineering.purdue.edu/people/mireille.boutin.1/ECE301kiwi/Lecture7>`_
+
=BPE FOR MULTIVARIATE GAUSSIAN=
 +
for [[ECE662:BoutinSpring08_Old_Kiwi|ECE662: Decision Theory]]
  
BPE FOR MULTIVARIATE GAUSSIAN :
+
Complement to  [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
- Estimation of mean, given a known covariance
+
----
 +
== Estimation of mean, given a known covariance ==
 
Consider a set of iid samples <math>\{X_i\}_{i=1}^N</math> where <math>X_i \in\mathbb{R}^n</math> is such that <math>X_i \sim N(\mu,\Sigma)</math>.  Suppose we know <math>\Sigma</math>, but wish to estimate <math>\mu</math> using BPE.  If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. <math>p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)</math>, where <math>\mu_N</math> and <math>\Sigma_N</math> are calculated to utilize both our prior knowledge of <math>\mu</math> and the samples <math>\{X_i\}_{i=1}^N</math>.  Fukunaga p. 391 derives that the parameters <math>\mu_N</math> and <math>\Sigma_N</math> are calculated as follows:
 
Consider a set of iid samples <math>\{X_i\}_{i=1}^N</math> where <math>X_i \in\mathbb{R}^n</math> is such that <math>X_i \sim N(\mu,\Sigma)</math>.  Suppose we know <math>\Sigma</math>, but wish to estimate <math>\mu</math> using BPE.  If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. <math>p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)</math>, where <math>\mu_N</math> and <math>\Sigma_N</math> are calculated to utilize both our prior knowledge of <math>\mu</math> and the samples <math>\{X_i\}_{i=1}^N</math>.  Fukunaga p. 391 derives that the parameters <math>\mu_N</math> and <math>\Sigma_N</math> are calculated as follows:
  
|mu_Ndef|,
+
<math>\mu_N = \frac{\Sigma}{N}(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right)</math>,
  
where |mu_0| is the initial "geuss" for the mean |mu|, and |Sig_mu| is the "confidence" in that guess.  In other words, we can consider that |mu_prior| is the prior distribution for |mu| that we would assume without seeing any samples.  For the covariance parameter, we have
+
where <math>\mu_0</math> is the initial "guess" for the mean <math>\mu</math>, and <math>\Sigma_\mu</math> is the "confidence" in that guess.  In other words, we can consider that <math>N(\mu_0,\Sigma_\mu)</math> is the prior distribution for <math>\mu</math> that we would assume without seeing any samples.  For the covariance parameter, we have
  
|Sig_Ndef|.
+
<math>\Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N}</math>.
  
We find that as the number of samples increases, that the effect of the prior knowledge (|mu_0|,|Sig_mu|) decreases so that
+
We find that as the number of samples increases, that the effect of the prior knowledge (<math>\mu_0</math>,<math>\Sigma_\mu</math>) decreases so that
  
|mu_Nlimit|, and |Sig_Nlimit|.
+
<math>\lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i</math>, and <math>\lim_{N\rightarrow\infty}\Sigma_N = 0</math>.
  
- Estimation of covariance, given a known mean
+
== Estimation of covariance, given a known mean ==
Again, given iid samples |Xset|, |XinRn|, |Xdist|, let us now estimate |Sig| with |mu| known.  As in Fukinaga p. 392, we assume that both the posterior distribution of |Sig| is normal (i.e. |Sig_posterior|), and it can be shown that the sample covariance matrix follows a Wishart Distribution.  Fukinaga p.392 shows the distribution |pK|, where |Kdef|, and parameter |Sig_0| represents the initial "guess" for |Sig| and |N_0| represents "how many samples were used to compute |Sig_0|".  Note that we compute the distribution for |Kdef| instead of |Sig| directly, since the inverse covariance matrix is used in the definition for a normal distribution.  It can be shown, then, that
+
Again, given iid samples <math>\{X_i\}_{i=1}^N</math>, <math>X_i \in\mathbb{R}^n</math>, <math>X_i \sim N(\mu,\Sigma)</math>, let us now estimate <math>\Sigma</math> with <math>\mu</math> known.  As in Fukunaga p. 392, we assume that both the posterior distribution of <math>\Sigma</math> is normal (i.e. <math>p(X|\Sigma) = N(\mu,\Sigma)</math>), and it can be shown that the sample covariance matrix follows a Wishart Distribution.  Fukunaga p.392 shows the distribution <math>p(K|\Sigma_0,N_0)</math>, where <math>K = \Sigma^{-1}</math>, and parameter <math>\Sigma_0</math> represents the initial "guess" for <math>\Sigma</math> and <math>N_0</math> represents "how many samples were used to compute <math>\Sigma_0</math>".  Note that we compute the distribution for <math>K = \Sigma^{-1}</math> instead of <math>\Sigma</math> directly, since the inverse covariance matrix is used in the definition for a normal distribution.  It can be shown, then, that
  
|pKdef|,
+
<math>p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K))</math>,
  
where |cnN0def|.
+
where <math>c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1}</math>.
  
.. |Xset| image:: tex
+
== Simultaneous estimation of unknown mean and covariance ==
:alt: tex: \{X_i\}_{i=1}^N
+
Finally, given iid samples <math>\{X_i\}_{i=1}^N</math>, <math>X_i \in\mathbb{R}^n</math>, <math>X_i \sim N(\mu,\Sigma)</math>, we now wish to estimate both <math>\mu</math> and <math>\Sigma</math> (or <math>K = \Sigma^{-1}</math>). Fukunaga p. 393 gives that the joint distribution follows the Gauss-Wishart distribution as follows
.. |XinRn| image:: tex
+
 
:alt: tex: X_i \in\mathbb{R}^n
+
<math>p(\mu,K|\mu_0,\Sigma_0,\mu_{\Sigma},N_0) = (2\pi)^{-n/2}|\mu_{\Sigma} K|^{1/2}\exp\left(-\frac12\mu_{\Sigma}(\mu-\mu_0)^TK(\mu-\mu_0) \right)\times c(n,N_0)|\frac12N_0\Sigma_0|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp\left(-\frac12\mathrm{trace}(N_0\Sigma_0K\right)</math>,
.. |Xdist| image:: tex
+
where <math>\mu_0</math>, <math>\Sigma_0</math>, <math>N_0</math>, and <math>c(n,N_0)</math> are as above.
:alt: tex: X_i \sim N(\mu,\Sigma)
+
----
.. |Sig| image:: tex
+
Back to  [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
:alt: tex: \Sigma
+
.. |Sig_0| image:: tex
+
:alt: tex: \Sigma_0
+
.. |N_0| image:: tex
+
:alt: tex: N_0
+
.. |mu| image:: tex
+
:alt: tex: \mu
+
.. |Sig_N| image:: tex
+
:alt: tex: \Sigma_N
+
.. |mu_N| image:: tex
+
:alt: tex: \mu_N
+
.. |mu_0| image:: tex
+
:alt: tex: \mu_0
+
.. |Sig_mu| image:: tex
+
:alt: tex: \Sigma_\mu
+
.. |mu_posterior| image:: tex
+
:alt: tex: p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N)
+
.. |mu_Ndef| image:: tex
+
:alt: tex: \mu_N = \frac{\Sigma}{N}(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu +  \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right)
+
.. |mu_prior| image:: tex
+
:alt: tex: N(\mu_0,\Sigma_\mu)
+
.. |Sig_Ndef| image:: tex
+
:alt: tex: \Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N}
+
.. |mu_Nlimit| image:: tex
+
:alt: tex: \lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i
+
.. |Sig_Nlimit| image:: tex
+
:alt: tex: \lim_{N\rightarrow\infty}\Sigma_N = 0
+
.. |Sig_posterior| image:: tex
+
:alt: tex: p(X|\Sigma) = N(\mu,\Sigma)
+
.. |pK| image:: tex
+
:alt: tex: p(K|\Sigma_0,N_0)
+
.. |Kdef| image:: tex
+
:alt: tex: K = \Sigma^{-1}
+
.. |pKdef| image:: tex
+
:alt: tex:  p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K))
+
.. |cnN0def| image:: tex
+
:alt: tex: c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1}
+

Latest revision as of 09:37, 20 May 2013

BPE FOR MULTIVARIATE GAUSSIAN

for ECE662: Decision Theory

Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin


Estimation of mean, given a known covariance

Consider a set of iid samples $ \{X_i\}_{i=1}^N $ where $ X_i \in\mathbb{R}^n $ is such that $ X_i \sim N(\mu,\Sigma) $. Suppose we know $ \Sigma $, but wish to estimate $ \mu $ using BPE. If we assume a prior distribution for the unknown mean to be distributed as a Gaussian random variable, we will obtain a posterior distribution for the mean which is also Gaussian, i.e. $ p(\mu|X_1,X_2,\ldots,X_N) = N(\mu_N,\Sigma_N) $, where $ \mu_N $ and $ \Sigma_N $ are calculated to utilize both our prior knowledge of $ \mu $ and the samples $ \{X_i\}_{i=1}^N $. Fukunaga p. 391 derives that the parameters $ \mu_N $ and $ \Sigma_N $ are calculated as follows:

$ \mu_N = \frac{\Sigma}{N}(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\mu_0 + \Sigma_\mu(\Sigma_\mu + \frac{\Sigma}{N})^{-1}\left(\frac1N\sum_{i=1}^NX_i\right) $,

where $ \mu_0 $ is the initial "guess" for the mean $ \mu $, and $ \Sigma_\mu $ is the "confidence" in that guess. In other words, we can consider that $ N(\mu_0,\Sigma_\mu) $ is the prior distribution for $ \mu $ that we would assume without seeing any samples. For the covariance parameter, we have

$ \Sigma_N = \Sigma_0(\Sigma_0+\frac{\Sigma}{N})^{-1}\frac{\Sigma}{N} $.

We find that as the number of samples increases, that the effect of the prior knowledge ($ \mu_0 $,$ \Sigma_\mu $) decreases so that

$ \lim_{N\rightarrow\infty}\mu_N = \frac1N\sum_{i=1}^NX_i $, and $ \lim_{N\rightarrow\infty}\Sigma_N = 0 $.

Estimation of covariance, given a known mean

Again, given iid samples $ \{X_i\}_{i=1}^N $, $ X_i \in\mathbb{R}^n $, $ X_i \sim N(\mu,\Sigma) $, let us now estimate $ \Sigma $ with $ \mu $ known. As in Fukunaga p. 392, we assume that both the posterior distribution of $ \Sigma $ is normal (i.e. $ p(X|\Sigma) = N(\mu,\Sigma) $), and it can be shown that the sample covariance matrix follows a Wishart Distribution. Fukunaga p.392 shows the distribution $ p(K|\Sigma_0,N_0) $, where $ K = \Sigma^{-1} $, and parameter $ \Sigma_0 $ represents the initial "guess" for $ \Sigma $ and $ N_0 $ represents "how many samples were used to compute $ \Sigma_0 $". Note that we compute the distribution for $ K = \Sigma^{-1} $ instead of $ \Sigma $ directly, since the inverse covariance matrix is used in the definition for a normal distribution. It can be shown, then, that

$ p(K|\Sigma_0,N_0) = c(n,N_0)\left|\frac12N_0\Sigma_0\right|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp(-\frac12\mathrm{trace}(N_0\Sigma_0K)) $,

where $ c(n,N_0) = \left\{\pi^{n(n-1)/4}\prod_{i=1}^n\Gamma\left(\frac{N_0-i}{2}\right)\right\}^{-1} $.

Simultaneous estimation of unknown mean and covariance

Finally, given iid samples $ \{X_i\}_{i=1}^N $, $ X_i \in\mathbb{R}^n $, $ X_i \sim N(\mu,\Sigma) $, we now wish to estimate both $ \mu $ and $ \Sigma $ (or $ K = \Sigma^{-1} $). Fukunaga p. 393 gives that the joint distribution follows the Gauss-Wishart distribution as follows

$ p(\mu,K|\mu_0,\Sigma_0,\mu_{\Sigma},N_0) = (2\pi)^{-n/2}|\mu_{\Sigma} K|^{1/2}\exp\left(-\frac12\mu_{\Sigma}(\mu-\mu_0)^TK(\mu-\mu_0) \right)\times c(n,N_0)|\frac12N_0\Sigma_0|^{(N_0-1)/2}|K|^{(N_0-n-2)/2}\exp\left(-\frac12\mathrm{trace}(N_0\Sigma_0K\right) $, where $ \mu_0 $, $ \Sigma_0 $, $ N_0 $, and $ c(n,N_0) $ are as above.


Back to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin

Alumni Liaison

EISL lab graduate

Mu Qiao