Line 41: Line 41:
 
<span class="texhtml">''L''(θ / ''x'') = ''f''(''x'' / θ)</span>  
 
<span class="texhtml">''L''(θ / ''x'') = ''f''(''x'' / θ)</span>  
  
x is regarded as fixed, and <span class="texhtml">θ</span>&nbsp;is regarded as the variable for L. The log-likelihood function<br>is defined as&nbsp;<span class="texhtml">''l''(θ / ''x'') = ''l''''o''''g''''L'''''<b>(θ / ''x'')</b></span>  
+
x is regarded as fixed, and <span class="texhtml">θ</span>&nbsp;is regarded as the variable for L. The log-likelihood function<br>is defined as&nbsp;<span class="texhtml">''l''(θ / ''x'') = ''l'''o'''g&nbsp;'''L'''''<b>(θ / ''x'')</b></span>  
  
 
The Maximum Likelihood Estimate (or MLE) is the value&nbsp;<math>\hat{ \theta } = \hat{\theta(x)} \in \Theta</math><br>maximizing <span class="texhtml">''L''(θ / ''x'')</span>, provided it exists:  
 
The Maximum Likelihood Estimate (or MLE) is the value&nbsp;<math>\hat{ \theta } = \hat{\theta(x)} \in \Theta</math><br>maximizing <span class="texhtml">''L''(θ / ''x'')</span>, provided it exists:  
Line 59: Line 59:
 
<span class="texhtml">''L''(''p'' | ''X'')</span>  
 
<span class="texhtml">''L''(''p'' | ''X'')</span>  
  
that is the likelihood of the parameters given the data.
+
that is the likelihood of the parameters given the data.  
  
 
For most sensible models, we will find that certain data are more probable than other data. The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. This is because the likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters  
 
For most sensible models, we will find that certain data are more probable than other data. The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. This is because the likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters  
Line 67: Line 67:
 
However, in the case of data analysis, we have already observed all the data: once they have been observed they are fixed, there is no 'probabilistic' part to them anymore (the word data comes from the Latin word meaning 'given'). We are much more interested in the likelihood of the model parameters that underly the fixed data.  
 
However, in the case of data analysis, we have already observed all the data: once they have been observed they are fixed, there is no 'probabilistic' part to them anymore (the word data comes from the Latin word meaning 'given'). We are much more interested in the likelihood of the model parameters that underly the fixed data.  
  
The following is the relation between the likelihood and the probability spaces:
+
The following is the relation between the likelihood and the probability spaces:'''<br>'''
 +
 
 +
<u>'''Probability:'''</u>
 +
 
 +
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Knowing Parameters&nbsp;<math>\rightarrow</math>&nbsp; Prediction of Outcomes
 +
 
 +
<u>'''Likelihood:'''</u>
 +
 
 +
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Observation of Data&nbsp;<math>\rightarrow</math>&nbsp; Estimation of Parameters

Revision as of 08:08, 5 April 2014

Tutorial on Maximum Likelihood Estimation: A Parametric Density Estimation Method



MLE Tutorial in PDF Format


Motivation


Suppose one wishes to determine just how biased an unfair coin is. Call the probability of
tossing a HEAD is p. The goal then is to determine p.

Also suppose the coin is tossed 80 times: i.e., the sample might be something like x1 = H,
x2 = T, …, x8 = T, and the count of number of HEADS, "H" is observed.

The probability of tossing TAILS is 1 − p. Suppose the outcome is 49 HEADS and 31 TAILS,
and suppose the coin was taken from a box containing three coins: one which gives HEADS
with probability p = 1 / 3, one which gives HEADS with probability p = 1 / 2 and another which
gives HEADS with probability p = 2 / 3. The coins have lost their labels, so which one it was is
unknown. Clearly the probability mass function for this experiment is binomial distribution with
sample size equal to 80, number of successes equal to 49 but different values of p. We have
the following probability mass functions for each of the above mentioned cases:

$ Pr(H = 49 | p = {1}/{3}) = \binom{80}{49}(1/3)^{49}(1 - 1/3)^31 \approx 0.000 $

$ Pr(H = 49 | p = {1}/{2}) = \binom{80}{49}(1/2)^{49}(1 - 1/2)^31 \approx 0.012 $

$ Pr(H = 49 | p = {2}/{3}) = \binom{80}{49}(2/3)^{49}(1 - 2/3)^31 \approx 0.054 $

Based on the above equations, we can conclude that the coin with p = 2 / 3 was more likely
to be picked up for the observations which we were given to begin with.



Definition


The generic situation is that we observe a n-dimensional random vector X with probability
density (or mass) function f(x / θ). It is assumed that θ is a fixed, unknown constant
belonging to the set $ \Theta \subset \mathbb{R}^{n} $.

For $ x \in \mathbb{R}^{n} $, the likelihood function of θ is defined as 

L(θ / x) = f(x / θ)

x is regarded as fixed, and θ is regarded as the variable for L. The log-likelihood function
is defined as l(θ / x) = loL(θ / x)

The Maximum Likelihood Estimate (or MLE) is the value $ \hat{ \theta } = \hat{\theta(x)} \in \Theta $
maximizing L(θ / x), provided it exists:

$ L(\hat{\theta}/(x)) = \underset{\theta}{argmax}[ L(\theta/x) ] $



What is Likelihood function ?

If the probability of an event X dependent on model parameters p is written as
P(X | p)

then we talk about the likelihood

L(p | X)

that is the likelihood of the parameters given the data.

For most sensible models, we will find that certain data are more probable than other data. The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely. This is because the likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters

If we were in the business of making predictions based on a set of solid assumptions, then we would be interested in probabilities - the probability of certain outcomes occurring or not occurring.

However, in the case of data analysis, we have already observed all the data: once they have been observed they are fixed, there is no 'probabilistic' part to them anymore (the word data comes from the Latin word meaning 'given'). We are much more interested in the likelihood of the model parameters that underly the fixed data.

The following is the relation between the likelihood and the probability spaces:

Probability:

                  Knowing Parameters $ \rightarrow $  Prediction of Outcomes

Likelihood:

                  Observation of Data $ \rightarrow $  Estimation of Parameters

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang