Advantages of MLE
Complement to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin
MLE
- Always have good convergence properties as number of training samples increases.
- MLE is often simpler than other methods of parameter estimation.
Parameter Estimation by MLE
Example 1: The Gaussian Case: Unknown $ \mu $
Suppose the samples are drawn from a multivariate normal population with mean $ \mu $ and covariance matrix $ \sigma $. For this example only mean is unknown. Let $ x_k $ be sample point.
$ \ln p(x_k|\mu) = -\frac{1}{2} \ln (2\pi)^d|\Sigma| - \frac{1}{2} (x_k - \mu)^t \Sigma^{-1} (x_k - \mu)) $
$ \nabla_{\mu} \ln p(x_k|\mu) = \Sigma^{-1}(x_k-\mu) $
Thus differentiating above equation and equating to 0, we get
$ \sum_{k=1}^n \Sigma^{-1} (x_k-\hat{\mu}) = 0 $
Multiplying by $ \Sigma $ and rearranging, we obtain
$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $
Thus the MLE for the unknown population mean is the arithmetic average of the training samples called *the sample mean*
Example 2: The Gaussian Case: Unknown $ \mu $ and $ \sigma $
In this example both mean $ \mu $ and covariance matrix $ \sigma $ are unknown. These unknown parameters constitute the components of the parameter vector $ \theta $. Consider univariate case with $ \theta_1 = \mu $ and $ \theta_2 = \sigma^2 $.
$ \ln p(x_k|\theta) = -\frac{1}{2} \ln 2\pi\theta_2 - \frac{1}{2\theta_2}(x_k - \theta_1)^2 $
Taking derivative of above equation
$ \nabla_{\theta}l = \nabla_{\theta} \ln p(x_k|\theta) = [ \frac{1}{\theta_2}(x_k - \theta_1) ; -\frac{1}{2\theta_2} +\frac{(x_k-\theta_1)^2}{2\theta_2^2}]. $
Equating the above equation to 0, we get
$ \sum_{k=1}^n \frac{1}{\hat{\theta_2}}(x_k-\hat{\theta_1}) = 0 $
and
$ -\sum_{k=-1}^{n} \frac{1}{\hat{\theta_2}} + \sum_{k=1}^n \frac{(x_k-\hat{\theta_1})^2}{\hat{\theta_2}^2} = 0 $
where $ \hat{\theta_1} $ and $ \hat{\theta_2} $ are maximum likelihood estimates for $ \theta_1 $ and $ \theta_2 $ respectively. Substituting $ \hat{\mu} = \hat{\theta_1} $ and $ \hat{\sigma} = \hat{\theta_2} $, we obtain
$ \hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k $
and
$ \hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2. $
See Also
- MLE Examples: Exponential and Geometric Distributions
- MLE Examples: Binomial and Poisson Distributions
Back to Lecture 7: Maximum Likelihood Estimation and Bayesian Parameter Estimation, ECE662, Spring 2010, Prof. Boutin