Maximum Likelihood Estimation (MLE) Analysis for various Probability Distributions
A slecture by Hariharan Seshadri
Contents
What would be the learning outcome from this slecture?
- Basic Theory behind Maximum Likelihood Estimation (MLE)
- Derivations for Maximum Likelihood Estimates for parameters of Exponential Distribution, Geometric Distribution, Binomial Distribution, Poisson Distribution, and Uniform Distribution
Introduction
The maximum likelihood estimate (MLE) is the value $ \hat{\theta} $ which maximizes the function L(θ) given by L(θ) = f (X1,X2,...,Xn | θ) where 'f' is the probability density function in case of continuous random variables and probability mass function in case of discrete random variables and 'θ' is the parameter being estimated.
In other words,$ \hat{\theta} $ = arg maxθ L(θ), where $ \hat{\theta} $ is the best estimate of the parameter 'θ' . Thus, we are trying to maximize the probability density (in case of continuous random variables) or the probability of the probability mass (in case of discrete random variables)
If the random variables X1,X2,...,Xn $ \epsilon $ R are Independent Identically Distributed (I.I.D.) then,
L(θ) = i=1 ∏ n f(xi | θ) . We need to find the value $ \hat{\theta} \epsilon $ θ that maximizes this function.
In the course, Purdue ECE 662, Pattern Recognition and Decision Taking Processes, we have already looked at the MLE of the Normal Distribution and found that to be:
$ \widehat{\mu} = \frac{\sum_{i=1}^{n}x_{i}}{n} $
$ \hat{\sigma{}^{2}} = \frac{1}{n-1}\sum_{i=1}^{n}(x_{i}-\widehat{\mu})^{2} $
where,
Xi's are the Normal (Gaussian) Random Variables $ \epsilon $ R ,
'n' is the number of samples, and
$ \widehat{\mu} $ and $ \hat{\sigma{}^{2}} $ are the estimated Mean and estimated Standard Deviation.
1. Exponential Distribution
Let X1,X2,...,Xn $ \epsilon $
R be a random sample from the exponential distribution with p.d.f.
f(x,θ)=(1|θ)*exp(−x|θ)
The likelihood function L(θ) is a function of x1, x2, x3,...,xn
L(θ)=(1|θ)*exp(−x1|θ)*(1|θ)*exp(−x2|θ)*...*(1|θ)*exp(−xn|θ)
L(θ)= (1|θn) * exp( i=1∑n -xi|θ)
We need to maximize L(θ) . The logarithm of this function will be easier to maximize.
ln[L(θ)] = -n . ln(θ) - (1|θ) i=1 ∑ n xi
Setting, its derivative to zero, we have:
(d|dθ) ln[L(θ)] = -n|θ + i=1∑n (-xi| θ2) = 0
which implies that
$ \hat{\theta} = \frac{\sum_{i=1}^{n}x_{i}}{n} = \overline{X} $
= Mean of x1, x2, x3,...,xnand this is the maximum likelihood estimate.
2. Geometric Distribution
Let X1,X2,...,Xn $ \epsilon $
R be a random sample from the geometric distribution with p.d.f.
f(x,p)=(1−p)x-1.p
where x=1,2,3,.... and $ 0 \leq p \leq 1 $
The likelihood function is given by:
L(p)=(1−p)x1-1.p.(1−p)x2-1.p.(1−)x3-1.p...(1−p)xn-1.p
L(p)=pn.(1-p)i=1 ∑ n xi-n}
The log-likelihood is:
ln L(p)=n . ln(p) + i=1∑nxi-n . ln(1-p)
Setting its derivative to zero, we have:
(d|dp) ln. L(p) = (n|p) - (i=1∑n xi-n|(1-p)) = 0
which implies that
$ \hat{p} = \frac{n}{\sum_{i=1}^{n}x_{i}} = \frac{1}{\overline{X}} $
The likelihood function is a function of x1,x2,...,xn,
which is the maximum likelihood estimate.
This can be intuitively checked as well. Since, Geometric Distribution is used to model a random variable X which is the number of trials before the first success is obtained. So for random variables X1,X2,...,Xn, these contain n
successes in X1+ X2 +...+ Xn trials. Thus, the estimate of p is the number of successes divided by the total number of trials.