Difference between revisions of "Maximum Likelihood Estimation Old Kiwi" - Rhea

Revision as of 09:15, 20 May 2013

Advantages of MLE :

Always have good convergence properties as number of training samples increases.
MLE is often simpler than other methods of parameter estimation.

Parameter Estimation by MLE

Example 1: The Gaussian Case: Unknown $\mu$

Suppose the samples are drawn from a multivariate normal population with mean $\mu$ and covariance matrix $\sigma$ . For this example only mean is unknown. Let $$ x_k $$ be sample point.

$\ln p(x_k|\mu) = -\frac{1}{2} \ln (2\pi)^d|\Sigma| - \frac{1}{2} (x_k - \mu)^t \Sigma^{-1} (x_k - \mu))$

$\nabla_{\mu} \ln p(x_k|\mu) = \Sigma^{-1}(x_k-\mu)$

Thus differentiating above equation and equating to 0, we get

$\sum_{k=1}^n \Sigma^{-1} (x_k-\hat{\mu}) = 0$

Multiplying by $\Sigma$ and rearranging, we obtain

$\hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k$

Thus the MLE for the unknown population mean is the arithmetic average of the training samples called *the sample mean*

Example 2: The Gaussian Case: Unknown $\mu$ and $\sigma$

In this example both mean $\mu$ and covariance matrix $\sigma$ are unknown. These unknown parameters constitute the components of the parameter vector $\theta$ . Consider univariate case with $\theta_1 = \mu$ and $\theta_2 = \sigma^2$ .

$\ln p(x_k|\theta) = -\frac{1}{2} \ln 2\pi\theta_2 - \frac{1}{2\theta_2}(x_k - \theta_1)^2$

Taking derivative of above equation

$\nabla_{\theta}l = \nabla_{\theta} \ln p(x_k|\theta) = [ \frac{1}{\theta_2}(x_k - \theta_1) ; -\frac{1}{2\theta_2} +\frac{(x_k-\theta_1)^2}{2\theta_2^2}].$

Equating the above equation to 0, we get

$\sum_{k=1}^n \frac{1}{\hat{\theta_2}}(x_k-\hat{\theta_1}) = 0$

and

$-\sum_{k=-1}^{n} \frac{1}{\hat{\theta_2}} + \sum_{k=1}^n \frac{(x_k-\hat{\theta_1})^2}{\hat{\theta_2}^2} = 0$

where $\hat{\theta_1}$ and $\hat{\theta_2}$ are maximum likelihood estimates for $\theta_1$ and $\theta_2$ respectively. Substituting $\hat{\mu} = \hat{\theta_1}$ and $\hat{\sigma} = \hat{\theta_2}$ , we obtain

$\hat{\mu} = \frac{1}{n} \sum_{k=1}^n x_k$

and

$\hat{\sigma}^2 = \frac{1}{n} \sum_{k=1}^n(x_k - \hat{\mu})^2.$

MLE Examples: Exponential and Geometric Distributions_Old Kiwi

MLE Examples: Binomial and Poisson Distributions_Old Kiwi

Revision as of 10:58, 24 March 2008 (view source) Pclough (Talk) (Moved from old Wiki)		Revision as of 09:15, 20 May 2013 (view source) Rhea (Talk \| contribs) Newer edit →
Line 1:		Line 1:
−	(See Also: [[Lecture 7_Old Kiwi]] and [[~~BPE_Old Kiwi~~]])	+	(See Also: [[Lecture 7_Old Kiwi]] and [[Bayesian_Parameter_Estimation_Old_Kiwi]])

	Advantages of MLE :		Advantages of MLE :

Difference between revisions of "Maximum Likelihood Estimation Old Kiwi" - Rhea

Revision as of 09:15, 20 May 2013

Alumni Liaison