Difference between revisions of "Bayersian Parameter Estimation" - Rhea

Latest revision as of 07:31, 29 April 2014

Bayesian Parameter Estimation: Gaussian Case

Loosely based on the ECE662 Spring 2014 lecture material of Prof. Mireille Boutin.

Introduction: Bayesian Estimation

According to Chapter #3.3 (Duda's book), although the answers we get by BPE will generally be nearly identical to those obtained by maximum likelihood estimation, the conceptual difference is significant. For maximum likelihood estimation, the parameter $\theta$ is a fixed while in Bayersian estimation $\theta$ is considered to be a random variable.

By definition, given samples class $\mathcal{D}$ , Bayes' formula then becomes:

$P(w_i|x,D) = \frac{p(x|w_i,D)P(w_i|D)}{\sum_{j = 1}^c p(x|w_j,D)P(w_j|D)}$

As the above equation suggests, we can use the information provided by the training data to help determine both the class-conditional densities and the priori probabilities.

Furthermore, since we are treating supervised case, we can separate the training samples by class into c subsets $\mathcal{D}_1, \mathcal{D}_2, ..., \mathcal{D}_c$ , accordingly:

$P(w_i|x,D) = \frac{p(x|w_i,D_i)P(w_i)}{\sum_{j = 1}^c p(x|w_j,D_j)P(w_j)}$

Now, assume $$ p(x) $$ has a parameter form. We are given a set of $$ N $$ independent samples $\mathcal{D} = \{x_1, x_2, ... , x_N \}$ . View $\theta$ as a random variable. Consider more specifically in continuous case:

$$ p(x|D) $$ can be computed as:

$p(x|D) = \int p(x, \theta|D)d\theta = \int p(x|\theta)p(\theta|D)d\theta$

Bayesian Parameter Estimation: General Theory

We first start with a generalized approach which can be applied to any situation in which the unknown density can be parameterized. The basic assumptions are as follows:

1. The form of the density $p(x|\theta)$ is assumed to be known, but the value of the parameter vector $\theta$ is not known exactly.

2. The initial knowledge about $\theta$ is assumed to be contained in a known a priori density $p(\theta)$ .

3. The rest of the knowledge about $\theta$ is contained in a set $\mathcal{D}$ of n samples $$ x_1, x_2, ... , x_n $$ drawn independently according to the unknown probability density $$ p(x) $$ .

Accordingly, already know:

p(x|D) = \int p(x|\theta)p(\theta|D)d\theta

and By Bayes Theorem,

p(\theta|D) = \frac{p(D|\theta)p(\theta)}{\int p(D|\theta)p(\theta|D)d\theta}

Now, since we are attempting to transform the equation to be based on samples $$ x_k $$ , by independent assumption,

p(D|\theta) = \prod_{k = 1}^n p(x_k|\theta)

Hence, if a sample $\mathcal{D}$ has n samples, we can denote the sample space as: $\mathcal{D}^n = \{x_1, x_2, ... x_n\}$ .

Combine the sample space definition with the equation above:

p(D^n|\theta) = p(D^{n-1}|\theta)p(x_n|\theta)

Using this equation, we can transform the Bayesian Parameter Estimation to:

p(\theta|D^n) = \frac{p(x_n|\theta)p(\theta|D^{n-1})}{\int p(x_n|\theta)p(\theta|D^{n-1})d\theta}

Bayesian Parameter Estimation: Gaussian Case

The Univariate Case: $p(\mu|\mathcal{D})$

Consider the case where $\mu$ is the only unknown parameter. For simplicity we assume:

p(x|\mu) \sim N(\mu, \sigma^2)

and

p(\mu) \sim N(\mu_0, \sigma_0^2)

From the previous section, the following expression could be easily obtained using Bayes' formula:

p(\mu|D) = \alpha \prod_{k = 1}^n p(x_k|\mu)p(\mu)

Where $\alpha$ is a factorization factor independent of $\mu$ .

Now, substitute $p(x_k|\mu)$ and $$ p(u) $$ with:

p(x_k|\mu) = \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]

p(u) = \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]

The equation has now become:

p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}] \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]

p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}} \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2} -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]

Update the scaling factor to $\alpha'$ and $\alpha''$ correspondingly,

p(\mu|D) = \alpha' exp \sum_{k=1}^n(-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2} -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2})

p(\mu|D) = \alpha'' exp [-\frac{1}{2}(\frac{n}{\sigma^2} + \frac{1}{\sigma_0^2})\mu^2 -2(\frac{1}{\sigma^2}\sum_{k=1}^nx_k + \frac{\mu_0}{\sigma_0^2})\mu]

With the knowledge of Gaussian distribution:

p(u|D) = \frac{1}{(2\pi\sigma_n^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_n}{\sigma_n})^{2}]

Finally, the estimate of $$ u_n $$ can be obtained:

\mu_n = (\frac{n\sigma_0^2}{n\sigma_0^2 + \sigma^2})\bar{x_n} + \frac{\sigma^2}{n\sigma_0^2 + \sigma^2}\mu_0

Where $\bar{x_n}$ is defined as sample means and $$ n $$ is the sample size.

In order to form a Gaussian distribution, the variance $\sigma_n^2$ associated with $\mu_n$ could also be obtained correspondingly as:

\sigma_n^2 = \frac{\sigma_0^2 \sigma^2}{n\sigma_0^2 + \sigma^2}

Observation:

With

N \to \infty

,

\sigma_D \to 0

$p(\mu|D)$ becomes more sharply peaked around $\mu_D$

The Univariate Case: $p(x|\mathcal{D})$

Having obtained the posteriori density for the mean $$ u_n $$ of set $\mathcal{D}$ , the remaining of the task is to estimate the "class-conditional" density for $$ p(x|D) $$ .

Based on the text Duda's chatpter #3.4.2 and Prof. Mimi's notes:

p(x|\mathcal{D}) = \int p(x|\mu)p(\mu|\mathcal{D})d\mu

p(x|\mathcal{D}) = \int \frac{1}{\sqrt{2 \pi } \sigma} \exp[{-\frac{1}{2} (\frac{x-\mu}{\sigma})^2}] \frac{1}{\sqrt{2 \pi } \sigma_n} \exp[{-\frac{1}{2} (\frac{\mu-\mu_n}{\sigma_n})^2}] d\mu

p(x|\mathcal{D}) = \frac{1}{2\pi\sigma\sigma_n} exp [-\frac{1}{2} \frac{(x-\mu)}{\sigma^2 + \sigma_n^2}]f(\sigma,\sigma_n)

Where $f(\sigma, \sigma_n)$ is defined as:

f(\sigma,\sigma_n) = \int exp[-\frac{1}{2}\frac{\sigma^2 + \sigma_n^2}{\sigma^2 \sigma_n ^2}(\mu - \frac{\sigma_n^2 x+\sigma^2 \mu_n}{\sigma^2+\sigma_n^2})^2]d\mu

Hence, $$ p(x|D) $$ is normally distributed as:

p(x|D) \sim N(\mu_n, \sigma^2 + \sigma_n^2)

References

[1]. Mireille Boutin, "ECE662: Statistical Pattern Recognition and Decision Making Processes," Purdue University, Spring 2014.

[2]. R. Duda, P. Hart, Pattern Classification. Wiley-Interscience. Second Edition, 2000.

Difference between revisions of "Bayersian Parameter Estimation" - Rhea

Latest revision as of 07:31, 29 April 2014

Contents

Introduction: Bayesian Estimation

Bayesian Parameter Estimation: General Theory

Bayesian Parameter Estimation: Gaussian Case

The Univariate Case: $p(\mu|\mathcal{D})$

The Univariate Case: $p(x|\mathcal{D})$

References

Alumni Liaison

@@ Line 2: / Line 2: @@
 [[Category:ECE662]]
-<center><font size= 4>
+<center><font size= 5>
-[[ECE662_Bayesian_Parameter_Estimation_S14_MH|Bayesian Parameter Estimation]]
+[[ECE662_Bayesian_Parameter_Estimation_S14_SF|Bayesian Parameter Estimation: Gaussian Case]]
 </font size>
-A [https://www.projectrhea.org/learning/slectures.php slecture] by [[ECE]] student [[user:Mhossain | Shaobo Fang]]
+A [https://www.projectrhea.org/learning/slectures.php slecture] by [[ECE]] student [[user:SFang | Shaobo Fang]]
 Loosely based on the [[2014_Spring_ECE_662_Boutin|ECE662 Spring 2014 lecture]] material of [[user:mboutin|Prof. Mireille Boutin]].
 </center>
-[[Media:slecture_ECE662_Whitening_and_Coloring_Transforms_S14_MH.pdf|PDF version]]
 ----
 ----
-== '''Introduction''' ==
+== '''Introduction: Bayesian Estimation''' ==
-Bayesian Parameter Estimation: General Theory
+According to Chapter #3.3 (Duda's book), although the answers we get by BPE will generally be nearly identical to those obtained by maximum likelihood estimation, the conceptual difference is significant. For maximum likelihood estimation, the parameter <math>\theta</math> is a fixed while in Bayersian estimation <math>\theta</math> is considered to be a random variable.
-Fundamentals of Bayesian Estimation
+By definition, given samples class <math>\mathcal{D}</math>, Bayes' formula then becomes:
+<center>
+<math>P(w_i|x,D) = \frac{p(x|w_i,D)P(w_i|D)}{\sum_{j = 1}^c p(x|w_j,D)P(w_j|D)}</math>
+</center>
-As stated in Duda's book, the conceptual difference between maximum likelihood estimation and Bayersian learning is that in MLE <math>\theta</math> is a fixed vector while in Bayersian estimation <math>\theta</math> is considered to be a random variable.
+As the above equation suggests, we can use the information provided by the training data to help determine both the class-conditional densities and the priori probabilities.
+Furthermore, since we are treating supervised case, we can separate the training samples by class into c subsets <math>\mathcal{D}_1, \mathcal{D}_2, ..., \mathcal{D}_c</math>, accordingly:
-By definition, given samples <math>\mathcal{D}</math> Bayer's formula is defined as
+<center>
+<math>P(w_i|x,D) = \frac{p(x|w_i,D_i)P(w_i)}{\sum_{j = 1}^c p(x|w_j,D_j)P(w_j)}</math>
+</center>
+Now, assume <math>p(x)</math> has a parameter form. We are given a set of <math>N</math> independent samples <math>\mathcal{D} = \{x_1, x_2, ... , x_N \}</math>. View <math>\theta</math> as a random variable. Consider more specifically in continuous case:
-<math>P(w_i|x,D) = \frac{p(x|w_i,D)P(w_i|D)}{\sum_{j = 1}^c p(x|w_j,D)P(w_j|D)}</math>
+<math>p(x|D)</math> can be computed as:
+<center>
+<math>p(x|D) = \int p(x, \theta|D)d\theta = \int p(x|\theta)p(\theta|D)d\theta</math>
+</center>
+----
-Furthermore, <math>p(x|D)</math> can be computed as:
+== '''Bayesian Parameter Estimation: General Theory''' ==
-<math>p(x|D) = \int p(x|\theta)p(\theta|D)d\theta</math>
+We first start with a generalized approach which can be applied to any situation in which the unknown density can be parameterized. The basic assumptions are as follows:
+. The form of the density <math>p(x|\theta)</math> is assumed to be known, but the value of the parameter vector <math>\theta</math> is not known exactly.
-Bayesian Parameter Estimation: General Theory
-It is important to know that:
-. The form of the density $p(x|\theta)$ is assumed to be known, but the value of the parameter vector <math>\theta</math> is not known exactly.
 . The initial knowledge about <math>\theta</math> is assumed to be contained in a known a priori density <math>p(\theta)</math>.
@@ Line 48: / Line 54: @@
 . The rest of the knowledge about <math>\theta</math> is contained in a set <math>\mathcal{D}</math> of n samples <math>x_1, x_2, ... , x_n</math> drawn independently according to the unknown probability density <math>p(x)</math>.
-Accordingly, based on:
+Accordingly, already know:
+<center><math>p(x|D) = \int p(x|\theta)p(\theta|D)d\theta</math></center>
-<math>p(x|D) = \int p(x|\theta)p(\theta|D)d\theta</math>
+and By Bayes Theorem,
-and Bayes Theorem,
+<center><math>p(\theta|D) = \frac{p(D|\theta)p(\theta)}{\int p(D|\theta)p(\theta|D)d\theta}</math></center>
-<math>p(\theta|D) = \frac{p(D|\theta)p(\theta)}{\int p(D|\theta)p(\theta|D)d\theta}</math>
+Now, since we are attempting to transform the equation to be based on samples <math>x_k</math>, by independent assumption,
-Now, since we are attempting to transform the equation to be based on samples <math>x_i</math>, by independent assumption,
+<center><math>p(D|\theta) = \prod_{k = 1}^n p(x_k|\theta)</math></center>
-<math>p(D|\theta) = \prod_{k = 1}^n p(x_i|\theta)</math>
 Hence, if a sample <math>\mathcal{D}</math> has n samples, we can denote the sample space as:
 <math>\mathcal{D}^n = \{x_1, x_2, ... x_n\}</math>.
-<math>p(D^n|\theta) = p(D^{n-1}|\theta)p(x_n|\theta)</math>
+Combine the sample space definition with the equation above:
+                                <center><math> p(D^n|\theta) = p(D^{n-1}|\theta)p(x_n|\theta) </math></center>
 Using this equation, we can transform the Bayesian Parameter Estimation to:
-<math>p(\theta|D^n) = \frac{p(x_n|\theta)p(\theta|D^{n-1})}{\int p(x_n|\theta)p(\theta|D^{n-1})d\theta}</math>
+<center><math>p(\theta|D^n) = \frac{p(x_n|\theta)p(\theta|D^{n-1})}{\int p(x_n|\theta)p(\theta|D^{n-1})d\theta}</math></center>
+----
+== '''Bayesian Parameter Estimation: Gaussian Case''' ==
+== ''The Univariate Case: <math>p(\mu|\mathcal{D})</math>'' ==
-Investigation of Estimator's Accuracy of Bayesian Parameter Estimation: Gaussian Case
+Consider the case where <math>\mu</math> is the only unknown parameter. For simplicity we assume:
-The Univariate Case: <math>p(\mu|\mathcal{D})</math>
-Assumptions:
-<math>p(x|\mu) \sim N(\mu, \sigma^2)</math>
+                                        <center><math>p(x|\mu) \sim N(\mu, \sigma^2)</math></center> and
-<math>p(\mu) \sim N(\mu_0, \sigma_0^2)</math>
+<center><math>p(\mu) \sim N(\mu_0, \sigma_0^2)</math></center>
-From the previous section, the following expression could be easily obtained:
+From the previous section, the following expression could be easily obtained using Bayes' formula:
-<math>p(\mu|D) = \alpha \prod_{k = 1}^n p(x_k|\mu)p(\mu)</math>
+<center><math>p(\mu|D) = \alpha \prod_{k = 1}^n p(x_k|\mu)p(\mu)</math></center>
 Where <math>\alpha</math> is a factorization factor independent of <math>\mu</math>.
-Now, substitute $p(x_k|\mu)$ and $p(u)$ with:
-$$p(x_k|\mu) = \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]$$
-$$p(u) = \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]$$
-Hence, accordingly,
-$$p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}] \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]$$
-$$p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}} \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}  -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]$$
+Now, substitute <math>p(x_k|\mu)</math> and <math>p(u)</math> with:
-Update the scaling factor to $\beta$,
+<center><math>p(x_k|\mu) = \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]</math></center>
-$$p(\mu|D) = \beta exp \sum_{k=1}^n(-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}  -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2})$$
+<center><math>p(u) = \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]</math></center>
-$$p(\mu|D) = \gamma exp [-\frac{1}{2}(\frac{n}{\sigma^2} + \frac{1}{\sigma_0^2})\mu^2  -2(\frac{1}{\sigma^2}\sum_{k=1}^nx_k + \frac{\mu_0}{\sigma_0^2})\mu]$$
+The equation has now become:
-Furthermore, since
+<center><math>p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}}exp[-\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}] \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}]</math></center>
-$$p(u|D) = \frac{1}{(2\pi\sigma_n^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_n}{\sigma_n})^{2}]$$
-Finally, the estimate of $u_n$ can be obtained:
-$$\mu_n = (\frac{n\sigma_0^2}{n\sigma_0^2 + \sigma^2})\bar{x_n} + \frac{\sigma^2}{n\sigma_0^2 + \sigma^2}\mu_0$$
-Where $\bar{x_n}$ is defined as sample means and $n$ is the sample size.
-In order to form a Gaussian distribution, the variance $\sigma_n^2$ associated with $\mu_n$ is defined as:
-$$\sigma_n^2 = \frac{\sigma_0^2 \sigma^2}{n\sigma_0^2 + \sigma^2}$$
-\subsection{The Univariate Case: $p(x|\mathcal{D})$}
+<center><math>p(\mu|D) = \alpha \prod_{k = 1}^n \frac{1}{(2\pi\sigma^2)^{1/2}} \frac{1}{(2\pi\sigma_0^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}  -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2}]</math></center>
-Having obtained the posteriori density for the mean $u_n$ of set $\mathcal{D}$, the remaining of the task is to estimate the "class-conditional" density for $p(x|D)$.
-Based on the text by \textbf{Duda's},
+Update the scaling factor to <math>\alpha'</math> and <math>\alpha''</math> correspondingly,
-$$p(x|\mathcal{D}) = \int p(x|\mu)p(\mu|\mathcal{D})d\mu$$
-$$p(x|\mathcal{D}) = \frac{1}{2\pi\sigma\sigma_n} exp [-\frac{1}{2} \frac{(x-\mu)}{\sigma^2 + \sigma_n^2}]f(\sigma,\sigma_n)$$
-Where $f(\sigma, \sigma_n)$ is defined as:
-$$f(\sigma,\sigma_n) = \int exp[-\frac{1}{2}\frac{\sigma^2 + \sigma_n^2}{\sigma^2 \sigma_n ^2}(\mu - \frac{\sigma_n^2 x+\sigma^2 \mu_n}{\sigma^2+\sigma_n^2})^2]d\mu$$
-Hence, $p(x|D)$ is normally distributed as:
-$$p(x|D) \sim N(\mu_n, \sigma^2 + \sigma_n^2)$$
+<center><math>p(\mu|D) = \alpha' exp \sum_{k=1}^n(-\frac{1}{2}(\frac{\mu-\mu_0}{\sigma_0})^{2}  -\frac{1}{2}(\frac{x_k-\mu}{\sigma})^{2})</math></center>
-\subsection{Experiment of Bayesian Parameter Estimation}
+<center><math>p(\mu|D) = \alpha'' exp [-\frac{1}{2}(\frac{n}{\sigma^2} + \frac{1}{\sigma_0^2})\mu^2  -2(\frac{1}{\sigma^2}\sum_{k=1}^nx_k + \frac{\mu_0}{\sigma_0^2})\mu]</math></center>
-\paragraph{Design}
+With the knowledge of Gaussian distribution:
-Assume n samples were obtained from the class $\mathcal{D}$ of unknown mean $\mu$ (known $\sigma$). Assume,
-$$p(x|\mu) \sim N(\mu, \sigma^2)$$
-$$p(\mu) \sim N(\mu_0, \sigma_0^2)$$
-While $\sigma = \sigma_0 = constant$, and $\mu_0 = 0$ (It does not matter what $\mu_0$ it was assumed to be, this will be verified shortly after). Based on the sample data $x_i \in \mathcal{D}, i = 1,2,3,...,n$, $\mu$ is desired to be estimated.
-The following results will be obtained:
+<center><math>p(u|D) = \frac{1}{(2\pi\sigma_n^2)^{1/2}}exp[-\frac{1}{2}(\frac{\mu-\mu_n}{\sigma_n})^{2}]</math></center>
-\begin{enumerate}
-\item The impact of $\mu_0$ on estimated $\hat{\mu}$
-\item The impact of sample size $n$ have on estimation accuracy
-\end{enumerate}
-\paragraph{Results}
+Finally, the estimate of <math>u_n</math> can be obtained:
-\begin{center}
-\includegraphics[scale=1]{BPE_1.png}
-Figure 21. The impact of $\mu_0$ on estimated $\hat{\mu}$ averaged over 50 samples
+<center><math>\mu_n = (\frac{n\sigma_0^2}{n\sigma_0^2 + \sigma^2})\bar{x_n} + \frac{\sigma^2}{n\sigma_0^2 + \sigma^2}\mu_0</math></center>
-\includegraphics[scale=1]{BPE_2.png}
+Where <math>\bar{x_n}</math> is defined as sample means and <math>n</math> is the sample size.
-Figure 22. The impact of $\mu_0$ on the variance of estimated $\hat{\mu}$ over 50 samples
+In order to form a Gaussian distribution, the variance <math>\sigma_n^2</math> associated with <math>\mu_n</math> could also be obtained correspondingly as:
+<center><math>\sigma_n^2 = \frac{\sigma_0^2 \sigma^2}{n\sigma_0^2 + \sigma^2}</math></center>
-\end{center}
-\paragraph{Conclusion} The estimated mean is shifting up with $\mu_0$ increasing. \textbf{Based on the experiment it can be concluded that the most 'accurate' estimate could be obtained if <math>\mu_0 = \mu</math>. But, according to the plot, even if the $\mu_0$ is different different from $\mu$, the error of estimation is still acceptable. (In our case, within [-0.1,+0.06] region)} However, the variance of estimated mean could be assumed to be identical as the \textbf{real empirical mean}.
+Observation:
+With <math>N \to \infty</math>, <center><math>\sigma_D \to 0</math></center>
+<math>p(\mu|D)</math> becomes more sharply peaked around <math>\mu_D</math>
-\begin{center}
+== ''The Univariate Case: <math>p(x|\mathcal{D})</math>'' ==
-\includegraphics[scale=0.7]{e23456.png}
-Figure 23. The impact of sample size $n$ have on estimation shape accuracy (sample sizes = 2,3,4,5,6)
+Having obtained the posteriori density for the mean <math>u_n</math> of set <math>\mathcal{D}</math>, the remaining of the task is to estimate the "class-conditional" density for <math>p(x|D)</math>.
-\includegraphics[scale=0.6]{ece662_14.png}
+Based on the text Duda's chatpter #3.4.2 and Prof. Mimi's notes:
-Figure 24. The impact of sample size $n$ have on estimation shape accuracy (sample sizes = 4,10,20,50,100)
+<center><math>p(x|\mathcal{D}) = \int p(x|\mu)p(\mu|\mathcal{D})d\mu</math></center>
+<center><math>p(x|\mathcal{D}) = \int \frac{1}{\sqrt{2 \pi } \sigma} \exp[{-\frac{1}{2} (\frac{x-\mu}{\sigma})^2}]  \frac{1}{\sqrt{2 \pi } \sigma_n} \exp[{-\frac{1}{2} (\frac{\mu-\mu_n}{\sigma_n})^2}] d\mu</math></center>
+<center><math>p(x|\mathcal{D}) = \frac{1}{2\pi\sigma\sigma_n} exp [-\frac{1}{2} \frac{(x-\mu)}{\sigma^2 + \sigma_n^2}]f(\sigma,\sigma_n)</math></center>
+Where <math>f(\sigma, \sigma_n)</math> is defined as:
+<center><math>f(\sigma,\sigma_n) = \int exp[-\frac{1}{2}\frac{\sigma^2 + \sigma_n^2}{\sigma^2 \sigma_n ^2}(\mu - \frac{\sigma_n^2 x+\sigma^2 \mu_n}{\sigma^2+\sigma_n^2})^2]d\mu</math></center>
+Hence, <math>p(x|D)</math> is normally distributed as:
+<center><math>p(x|D) \sim N(\mu_n, \sigma^2 + \sigma_n^2)</math></center>
+----
+----
-\end{center}
+== '''References''' ==
-\paragraph{Conclusion} Figure 23. and Figure 24. have demonstrated that with insufficient sample size the result would be really poor regarding prediction of points distribution.
+[1]. [https://engineering.purdue.edu/~mboutin/ Mireille Boutin], "ECE662: Statistical Pattern Recognition and Decision Making Processes," Purdue University, Spring 2014.
-<center>[[Image:fig3_summ_mh.png|680px|thumb|left|Fig 3: Summary diagram of whitening and coloring process.]]</center>
+[2]. R. Duda, P. Hart, ''Pattern Classification''. Wiley-Interscience. Second Edition, 2000.

Difference between revisions of "Bayersian Parameter Estimation" - Rhea

Latest revision as of 07:31, 29 April 2014

Contents

Introduction: Bayesian Estimation

Bayesian Parameter Estimation: General Theory

Bayesian Parameter Estimation: Gaussian Case

The Univariate Case: $ p(\mu|\mathcal{D}) $

The Univariate Case: $ p(x|\mathcal{D}) $

References

Alumni Liaison

The Univariate Case: $p(\mu|\mathcal{D})$

The Univariate Case: $p(x|\mathcal{D})$