(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
In Lecture 11, we continued our discussion of Parametric Density Estimation techniques. We discussed the Maximum Likelihood Estimation (MLE) method and look at a couple of 1-dimension examples for case when feature in dataset follows Gaussian distribution. First, we looked at case where mean parameter was unknown, but variance parameter is known. Then we followed with another example where both mean and variance where unknown. Finally, we looked at the slight "bias" problem when calculating the variance.
+
[[Category:2010 Spring ECE 662 mboutin]]
  
Below are the notes from lecture.
+
=Details of Lecture 11, [[ECE662]] Spring 2010=
 +
In Lecture 11, we continued our discussion of Parametric Density Estimation techniques. We discussed the Maximum Likelihood Estimation (MLE) method and look at a couple of 1-dimension examples for case when feature in dataset follows Gaussian distribution. First, we looked at case where mean parameter was unknown, but variance parameter is known. Then we followed with another example where both mean and variance where unknown. Finally, we looked at the slight "bias" problem when calculating the variance.  
  
== Maximum Likelihood Estimation (MLE) ==
+
Note for this lecture can be found [[noteslecture11ECE662S10|here]].
----
+
  
'''General Principles'''
+
Previous: [[Lecture10ECE662S10|Lecture 10]]
 +
Next: [[Lecture12ECE662S10|Lecture 12]]
  
Given vague knowledge about a situation and some training data (i.e. feature vector values for which the class is known)
+
----
<math>\vec{x}_l, \qquad l=1,\ldots,\text{hopefully large number}</math>
+
[[ 2010 Spring ECE 662 mboutin|Back to 2010 Spring ECE 662 mboutin]]
 
+
we want to estimate
+
<math>p(\vec{x}|\omega_i), \qquad i=1,\ldots,k</math>
+
 
+
# Assume a parameter form for <math>p(\vec{x}|\omega_i), \qquad i=1,\ldots,k</math>
+
# Use training data to estimate the parameters of <math>p(\vec{x}|\omega_i)</math>, e.g. if you assume <math>p(\vec{x}|\omega_i)=\mathcal{N}(\mu,\Sigma)</math>, then need to estimate <math>\mu</math> and <math>\Sigma</math>.
+
# Hope that as cardinality of training set increases, estimate for parameters converges to true parameters.
+
 
+
 
+
Let <math>\mathcal{D}_i</math> be the training set for class <math>\omega_i, \qquad i=1,\ldots,k</math>. Assume elements of <math>\mathcal{D}_i</math> are i.i.d. with <math>p(\vec{x}|\omega_i)</math>. Choose a parametric form for <math>p(\vec{x}|\omega_i)</math>.
+
 
+
<math>p(\vec{x}|\omega_i, \vec{\Theta}_i)</math> where <math>\vec{\Theta}_i</math> are the parameters.
+
 
+
 
+
'''How to estimate <math>\vec{\Theta}_i</math>?'''
+
 
+
* Consider each class separately
+
**<math>\vec{\Theta}_i \to \vec{\Theta}</math>
+
**<math>\mathcal{D}_i \to \mathcal{D}</math>
+
**Let <math>N = \vert \mathcal{D}_i \vert</math>
+
**samples <math>\vec{x}_1,\vec{x}_2,\ldots,\vec{x}_N</math> are independent
+
So <math>p(\mathcal{D}|\vec{\Theta})=\prod_{j=1}^N(p(x_j|\vec{\Theta}))</math>
+
 
+
'''Definition:''' The maximum likelihood estimate of <math>\vec{\Theta}</math> is the value <math>\hat{\Theta}</math> that maximizes <math>p(\mathcal{D}|\vec{\Theta}) </math>.
+
 
+
Observe. <math>\hat{\Theta}</math> also maximizes <math>l(\Theta)=\text{ln} p(\mathcal{D}|\vec{\Theta}) = \sum_{l=1}^N \text{ln} p(\vec{x}_l|\vec{\Theta})</math> where ln is the logarithm base of natural number <math>e</math>.
+
 
+
<math>l(\Theta)</math> called ''log likelihood''.
+
 
+
<math>\hat{\Theta}=\arg\max_\vec{\Theta}l(\vec{\Theta})</math>
+
 
+
If <math>p(\mathcal{D}|\vec{\Theta})</math> is a derivative function of <math>\hat{\Theta} \in</math>
+
 
+
--[[User:Gmodeloh|Gmodeloh]] 13:37, 21 April 2010 (UTC)
+

Latest revision as of 08:15, 11 May 2010


Details of Lecture 11, ECE662 Spring 2010

In Lecture 11, we continued our discussion of Parametric Density Estimation techniques. We discussed the Maximum Likelihood Estimation (MLE) method and look at a couple of 1-dimension examples for case when feature in dataset follows Gaussian distribution. First, we looked at case where mean parameter was unknown, but variance parameter is known. Then we followed with another example where both mean and variance where unknown. Finally, we looked at the slight "bias" problem when calculating the variance.

Note for this lecture can be found here.


Previous: Lecture 10 Next: Lecture 12


Back to 2010 Spring ECE 662 mboutin

Alumni Liaison

Prof. Math. Ohio State and Associate Dean
Outstanding Alumnus Purdue Math 2008

Jeff McNeal