Line 1: Line 1:
(See Also: [[Lecture 7_Old Kiwi]] and [[Bayesian_Parameter_Estimation_Old_Kiwi]])
+
[[Bayesian_Parameter_Estimation_Old_Kiwi]])
 
+
Advantages of MLE :
+
  
 +
==Advantages of MLE ==
 +
Complement to [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin
 +
----
 +
MLE
 
# Always have good convergence properties as number of training samples increases.
 
# Always have good convergence properties as number of training samples increases.
 
# MLE is often simpler than other methods of parameter estimation.
 
# MLE is often simpler than other methods of parameter estimation.
Line 51: Line 53:
  
 
<math>\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.</math>
 
<math>\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.</math>
 
+
----
[[MLE Examples: Exponential and Geometric Distributions_Old Kiwi]]
+
==See Also==
 
+
*[[MLE Examples: Exponential and Geometric Distributions_Old Kiwi|MLE Examples: Exponential and Geometric Distributions]]
[[MLE Examples: Binomial and Poisson Distributions_Old Kiwi]]
+
*[[MLE Examples: Binomial and Poisson Distributions_Old Kiwi|MLE Examples: Binomial and Poisson Distributions]]
 +
----
 +
Back to [[Lecture_7_-_MLE_and_BPE_OldKiwi|Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation]], [[ECE662]], Spring 2010, Prof. Boutin

Revision as of 09:29, 20 May 2013

Bayesian_Parameter_Estimation_Old_Kiwi)

Advantages of MLE

Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin


MLE

  1. Always have good convergence properties as number of training samples increases.
  2. MLE is often simpler than other methods of parameter estimation.

Parameter Estimation by MLE

Example 1: The Gaussian Case: Unknown $ \mu $

Suppose the samples are drawn from a multivariate normal population with mean $ \mu $ and covariance matrix $ \sigma $. For this example only mean is unknown. Let $ x_k $ be sample point.

$ \ln p(x_k|\mu) = -\frac{1}{2} \ln (2\pi)^d|\Sigma| - \frac{1}{2} (x_k - \mu)^t \Sigma^{-1} (x_k - \mu)) $

$ \nabla_{\mu} \ln p(x_k|\mu) = \Sigma^{-1}(x_k-\mu) $

Thus differentiating above equation and equating to 0, we get

$ \sum_{k=1}^n \Sigma^{-1} (x_k-\hat{\mu}) = 0 $

Multiplying by $ \Sigma $ and rearranging, we obtain

$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $

Thus the MLE for the unknown population mean is the arithmetic average of the training samples called *the sample mean*

Example 2: The Gaussian Case: Unknown $ \mu $ and $ \sigma $

In this example both mean $ \mu $ and covariance matrix $ \sigma $ are unknown. These unknown parameters constitute the components of the parameter vector $ \theta $. Consider univariate case with $ \theta_1 = \mu $ and $ \theta_2 = \sigma^2 $.

$ \ln p(x_k|\theta) = -\frac{1}{2} \ln 2\pi\theta_2 - \frac{1}{2\theta_2}(x_k - \theta_1)^2 $

Taking derivative of above equation

$ \nabla_{\theta}l = \nabla_{\theta} \ln p(x_k|\theta) = [ \frac{1}{\theta_2}(x_k - \theta_1) ; -\frac{1}{2\theta_2} +\frac{(x_k-\theta_1)^2}{2\theta_2^2}]. $

Equating the above equation to 0, we get

$ \sum_{k=1}^n \frac{1}{\hat{\theta_2}}(x_k-\hat{\theta_1}) = 0 $

and

$ -\sum_{k=-1}^{n} \frac{1}{\hat{\theta_2}} + \sum_{k=1}^n \frac{(x_k-\hat{\theta_1})^2}{\hat{\theta_2}^2} = 0 $

where $ \hat{\theta_1} $ and $ \hat{\theta_2} $ are maximum likelihood estimates for $ \theta_1 $ and $ \theta_2 $ respectively. Substituting $ \hat{\mu} = \hat{\theta_1} $ and $ \hat{\sigma} = \hat{\theta_2} $, we obtain

$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $

and

$ \hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2. $


See Also


Back to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin

Alumni Liaison

Correspondence Chess Grandmaster and Purdue Alumni

Prof. Dan Fleetwood