m (Protected "ECE600 F13 Stochastic Convergence mhossain" [edit=sysop:move=sysop]) |
|||
(8 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
[[ECE600_F13_notes_mhossain|Back to all ECE 600 notes]] | [[ECE600_F13_notes_mhossain|Back to all ECE 600 notes]] | ||
+ | |||
+ | [[Category:ECE600]] | ||
+ | [[Category:probability]] | ||
+ | [[Category:lecture notes]] | ||
+ | [[Category:slecture]] | ||
<center><font size= 4> | <center><font size= 4> | ||
− | '''Random Variables and Signals''' | + | [[ECE600_F13_notes_mhossain|'''The Comer Lectures on Random Variables and Signals''']] |
</font size> | </font size> | ||
+ | |||
+ | [https://www.projectrhea.org/learning/slectures.php Slectures] by [[user:Mhossain | Maliha Hossain]] | ||
<font size= 3> Topic 18: Stochastic Convergence</font size> | <font size= 3> Topic 18: Stochastic Convergence</font size> | ||
</center> | </center> | ||
− | + | ---- | |
---- | ---- | ||
Line 55: | Line 62: | ||
Since X<math>_n(\omega)</math> is generally a different sequence for very <math>\omega</math> ∈ ''S'', what does it mean for X<math>_n</math> to converge? We will discuss different ways in which X<math>_n</math> can converge. | Since X<math>_n(\omega)</math> is generally a different sequence for very <math>\omega</math> ∈ ''S'', what does it mean for X<math>_n</math> to converge? We will discuss different ways in which X<math>_n</math> can converge. | ||
− | '''Definition''' <math>\qquad</math> X<math>_n</math> '''converges everywhere''' if the sequence X<math>_1(\omega)</math>,X< | + | '''Definition''' <math>\qquad</math> X<math>_n</math> '''converges everywhere''' if the sequence X<math>_1(\omega)</math>,X<math>_2(\omega)</math>,... converges to some value X<math>(\omega)</math> ∀<math>\omega</math> ∈ ''S''. We also call this '''sure convergence''' or '''convergence surely'''. |
− | <center>[[Image:fig2_stochastic_convergence.png|350px|thumb|left|Fig 2: | + | <center>[[Image:fig2_stochastic_convergence.png|350px|thumb|left|Fig 2: Convergence for all <math>\omega</math> in ''S''.]]</center> |
'''Notation''' <br/> | '''Notation''' <br/> | ||
Line 64: | Line 71: | ||
− | '''Definition''' <math>\qquad</math> X<math>_n</math> '''converges almost everywhere''' or '''almost surely''', if X<math>_n(\omega)</math> → for some A ∈ ''S'' with P(A) = 1. Also known as convergence with probability 1. | + | '''Definition''' <math>\qquad</math> X<math>_n</math> '''converges almost everywhere''' or '''almost surely''', if X<math>_n(\omega)</math> → X<math>(\omega)</math> for some A ∈ ''S'' with P(A) = 1. Also known as convergence with probability 1. |
'''Notation''' <br/> | '''Notation''' <br/> | ||
Line 138: | Line 145: | ||
==Cauchy Criterion for Convergence== | ==Cauchy Criterion for Convergence== | ||
− | We will now discuss a method | + | We will now discuss a method for showing that X<math>_n</math> → X in the mean square sense for some random variables X without knowing what X is. Frist, consider a sequence of real numbers. |
+ | |||
+ | '''Definition''' If x<math>_n</math> is a sequence of real numbers and <br/> | ||
+ | <center><math>|x_{m+n}-x_n|\rightarrow 0</math><br/> | ||
+ | as<br/> | ||
+ | <math>n\rightarrow \infty \quad \forall m>0</math></center> | ||
+ | |||
+ | then x<math>_n</math> converges iff it is a Cauchy sequence. This is known as the Cauchy criterion for convergence. | ||
+ | |||
+ | '''The Cauchy Criterion for mean square convergence of X<math>_n</math>''' <math>\qquad</math> It can be shown that a random sequence X<math>_n</math> converges in the mean square sense iff <br/> | ||
+ | <center><math>E[|X_{n+m}-X_n|^2]\rightarrow 0\quad\forall m>0</math><br/> | ||
+ | as<br/> | ||
+ | <math>n\rightarrow\infty</math></center> | ||
+ | |||
+ | It can be shown that the following relationships between types of convergences hold: | ||
+ | |||
+ | <center>[[Image:fig3_stochastic_convergence.png|350px|thumb|left|Fig 3: Relationships implied by the different types of convergence. Note that the box represents all possible random sequences]]</center> | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==Some Important Random Sequences== | ||
+ | |||
+ | First, we need the '''Chebyshev Inequality''':<br/> | ||
+ | Let X be a random variable with mean <math>\mu</math> and variance <math>\sigma^2</math>. Then, <br/> | ||
+ | <center><math>P(|X_\mu|\geq\epsilon)\leq\frac{\sigma^2}{\epsilon^2}\quad\forall\epsilon>0</math></center> | ||
+ | and<br/> | ||
+ | <center><math>g_2(x) = \frac{(x-\mu)^2}{\epsilon^2}</math></center> | ||
+ | |||
+ | '''Proof''' <math>\quad</math> let <br/> | ||
+ | <center><math>g_1(x) = I_{\{x\in\mathbb R:|x-\mu|\geq\epsilon\}}(x)</math></center> | ||
+ | for a fixed ε > 0. Then, <br/> | ||
+ | <center><math>\begin{align} | ||
+ | E[g_1(x)]&=\int_{-\infty}^{\infty}g_1(x)f_X(x)dx \\ | ||
+ | &=P(|X-\mu|\geq\epsilon) | ||
+ | \end{align}</math></center> | ||
+ | and <br/> | ||
+ | <center><math>E[g_2(X)]=E\left[\frac{(X-\mu)^2}{\epsilon^2}\right]=\frac{\sigma^2}{\epsilon^2}</math></center> | ||
+ | |||
+ | Now consider <math>\phi</math>(x) = g<math>_2</math>(x) - g<math>_1</math>(x) | ||
+ | |||
+ | <center>[[Image:fig4_stochastic_convergence.png|350px|thumb|left|Fig 4: g<math>_1</math> and g<math>_2</math> as functions of x. Note that g<math>_2</math>(x) ≥ g<math>_1</math>(x) for all x.]]</center> | ||
+ | |||
+ | We have that <math>\phi</math>(x) ≥ 0 ∀x ∈ '''R'''. So,<br/> | ||
+ | <center><math>0\leq E[\phi(X)]=E[g_2(X)]-E[g_1(X)]=\frac{\sigma^2}{\epsilon^2}-P(|X-\mu|\geq\epsilon)</math></center> | ||
+ | <center><math>\Rightarrow P(|X_\mu|\geq\epsilon)\leq\frac{\sigma^2}{\epsilon^2}</math></center> | ||
+ | |||
+ | |||
+ | ===The Laws of Large Numbers=== | ||
+ | |||
+ | Recall that the previous example where <br/> | ||
+ | <center><math>X_k=s+W_k \ </math></center> | ||
+ | where s ∈ '''R''' and E[W<math>_k</math>] = 0. Let <br/> | ||
+ | <center><math>Y_n=\frac{1}{n}\sum_{k=1}^n X_k </math></center> | ||
+ | Does Y<math>_n</math> converge to s? If so, in what sense? The laws of large numbers address this question. | ||
+ | |||
+ | '''Theorem''' <math>\qquad</math> '''The Weak Law of Large Numbers''' <br/> | ||
+ | Let X<math>_n</math> be a sequence of iid random variables with mean <math>\mu</math> and variance <math>\sigma^2</math>. Let <br/> | ||
+ | <center><math>Y_n=\frac{1}{n}\sum_{k=1}^n X_k</math></center> | ||
+ | Y<math>_n</math> is the sample mean. <br/> | ||
+ | Then, for any <math>\epsilon</math> < 0, <br/> | ||
+ | <center><math>P(|Y_n-\mu|\geq\epsilon)\rightarrow0</math> | ||
+ | as <br/> | ||
+ | <math>n\rightarrow\infty</math></center> | ||
+ | i.e., Y<math>_n</math> → <math>\mu</math> in probability. | ||
+ | |||
+ | '''Proof'''<br/> | ||
+ | <center><math>\begin{align} | ||
+ | E[Y_n]&=\mu \\ | ||
+ | \mbox{Var}(Y_n)&=\frac{\sigma^2}{n} | ||
+ | \end{align}</math></center> | ||
+ | |||
+ | Using the Chebyshev Inequality, we have that <br/> | ||
+ | <center><math>P(|Y_n-\mu|\geq\epsilon)\leq\frac{\mbox{Var}(Y_n)}{\epsilon^2}=\frac{\sigma^2}{\epsilon^2n}\;\overset{n\rightarrow\infty}{\longrightarrow}\;0</math></center> | ||
+ | |||
+ | '''Theorem <math>\qquad</math> Strong Law of Large Numbers''' <br/> | ||
+ | Let X<math>_n</math> be a sequence of iid random variables with mean <math>\mu</math> and variance <math>\sigma^2</math>. Then <br/> | ||
+ | <center><math>Y_n=\frac{1}{n}\sum_{k=1}^nX_k\overset{a.e.}{\longrightarrow}\mu</math> | ||
+ | as <br/> | ||
+ | <math>n\rightarrow\infty</math></center> | ||
+ | |||
+ | The proof is omitted as it is beyond the scope of this class. Nevertheless, note that the strong law of large numbers implies the weak law of large numbers. | ||
+ | |||
+ | So for example <br/> | ||
+ | <center><math>X_k=s+W_k \ </math></center> | ||
+ | if the W<math>_k</math>'s are independent and have the same variance <math>\sigma^2</math>, then the sample mean Y<math>_n</math> converges to s with probability 1. | ||
+ | |||
+ | But the laws of large numbers tell us something more fundamental about our axiomatic-based probability measure. Consider a random variable X and a set A ⊂ B('''R'''). Let X<math>_1</math>,X<math>_2</math>,... be an iid sequence of random variables with the same distribution as X. Now let <br/> | ||
+ | <center><math>Y_k=\begin{cases} | ||
+ | 1 & \mbox{if} \;X_k\in A \\ | ||
+ | 0 & \mbox{if} \;X_k\notin A | ||
+ | \end{cases}</math></center> | ||
+ | |||
+ | Then<br/> | ||
+ | <center><math>E[Y_k] = P(X_k\in A) = P(X\in A)</math></center> | ||
+ | |||
+ | The Y <math>_k</math>'s are iid with with mean P(X ∈ A) and the relative frequency of the even {X ∈ A} is <br/> | ||
+ | <center><math>\frac{1}{n}\sum_{k=1}^nY_k</math></center> | ||
+ | |||
+ | The strong law pf large number tells us that <br/> | ||
+ | <center><math>\frac{1}{n}\sum_{k=1}^nY_k\rightarrow P(X\in A)</math> | ||
+ | with probability 1 as <br/> | ||
+ | <math>n\rightarrow\infty</math></center> | ||
+ | |||
+ | So with probability 1, the relative frequency approach to probability converges to the axiom-based P(X ∈ A). | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==The Central Limit Theorem== | ||
+ | |||
+ | Let X<math>_n</math> be a sequence of iid random variables with finite mean <math>\mu</math> and variance <math>\sigma^2</math>. Then if <br/> | ||
+ | <center><math>Z_n = \frac{\sum_{j=1}^nX_j-n\mu}{\sigma\sqrt{n}}</math></center> | ||
+ | then, <br/> | ||
+ | <center><math>Z_n\overset{d}{\longrightarrow}Z</math> | ||
+ | as<br/> | ||
+ | <math>n\rightarrow\infty</math></center> | ||
+ | where Z is N(0,1). | ||
+ | |||
+ | i.e.<br/> | ||
+ | <center><math>F_{Z_n}(z)\rightarrow\Phi(z)=\int_{-\infty}^z\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx </math></center> | ||
+ | |||
+ | '''Proof''' <math>\qquad</math> We will use the following Lemma | ||
+ | |||
+ | '''Lemma''' <math>\qquad</math> Let Z<math>_n</math> be a sequence of random variables with cdfs F<math>_{Zn}</math> and mgfs <math>\Phi_{Zn}</math>(s), and let Z be a random variable with cdf F<math>_Z</math> and mgf <math>\Phi_Z</math>. Then if <math>\Phi_{Zn}</math>(s) → <math>\Phi_Z</math> ∀s ∈ '''C''', then F<math>_{Zn}</math>(z) → F<math>_Z</math>(z) ∀z ∈ '''R'''. | ||
+ | |||
+ | So it is sufficient to show that <br/> | ||
+ | <center><math>\Phi_{Z_n}(s)\rightarrow e^{\frac{s^2}{2}}</math></center> | ||
+ | |||
+ | Consider <br/> | ||
+ | <center><math> \Phi_X\left(\frac{s}{\sqrt n}\right)=E\left[e^{\frac{sX_k}{\sqrt n}}\right]\quad\forall k</math></center> | ||
+ | |||
+ | Now considering the case <math>\mu=0</math>, <math>\sigma^2=1</math>, we have <br/> | ||
+ | <center><math>\begin{align} | ||
+ | \Phi_{Z_n}(s) &= E\left[e^{s\sum_{k=1}^n\frac{X_k}{\sqrt n}}\right] \\ | ||
+ | &=E \left[\prod_{k=1}^ne^{s\frac{X_k}{\sqrt n}}\right] \\ | ||
+ | &=\prod_{k=1}^nE\left[e^{s\frac{X_k}{\sqrt n}}\right] \\ | ||
+ | &=\prod_{k=1}^n\Phi_X\left(\frac{s}{\sqrt n}\right) \\ | ||
+ | &=\left(\Phi_X\left(\frac{s}{\sqrt n}\right)\right)^n | ||
+ | \end{align}</math></center> | ||
+ | |||
+ | Let L(s) = log<math>|phi_X</math>(s). Then, to show that <br/> | ||
+ | <center><math>\Phi_{Z_n}(s)\rightarrow e^{\frac{s^2}{2}}</math></center> | ||
+ | it is sufficient to show that <br/> | ||
+ | <center><math>nL\left(\frac{s}{\sqrt n}\right)=\log\left[\left(\Phi_X\left(\frac{s}{\sqrt n}\right)\right)^n\right]\rightarrow\frac{s^2}{2}</math></center> | ||
+ | |||
+ | Will need <br/> | ||
+ | <center><math>\begin{align} | ||
+ | & L(0)=0 \\ | ||
+ | & L'(0)=\mu=0 \\ | ||
+ | & L''(0)=E\left[X^2\right]=1 \\ | ||
+ | \end{align}</math></center> | ||
+ | |||
+ | Now <br/> | ||
+ | <center><math>\begin{align} | ||
+ | \lim_{n\rightarrow\infty} nL\left(\frac{s}{\sqrt n}\right)&=\lim_{n\rightarrow\infty}\frac{L\left(\frac{s}{\sqrt n}\right)}{n^{-1}} \\ | ||
+ | &= \lim_{n\rightarrow\infty} \frac{-L'\left(\frac{s}{\sqrt n}\right)n^{-\frac{3}{2}}s}{-2n^{-2}} \quad (\mbox{L}'\mbox{Hopital}) \\ | ||
+ | &=\lim_{n\rightarrow\infty} \frac{L'\left(\frac{s}{\sqrt n}\right)s}{2n^{-\frac{1}{2}}} \\ | ||
+ | &= \lim_{n\rightarrow\infty} \frac{-L''\left(\frac{s}{\sqrt n}\right)n^{-\frac{3}{2}}s^2}{-2n^{-\frac{3}{2}}} \quad (\mbox{L}'\mbox{Hopital again}) \\ | ||
+ | &=\lim_{n\rightarrow\infty} \frac{L''\left(\frac{s}{\sqrt n}\right)s^2}{2} \\ | ||
+ | &=\frac{s^2}{2} | ||
+ | \end{align}</math></center> | ||
+ | |||
+ | For general <math>\mu</math>, <math>\sigma^2</math>, use the result for <math>\mu</math> = 0, <math>\sigma^2</math> = 1 with <br/> | ||
+ | <center><math>Y_k = \frac{X_k-\mu}{\sigma}</math></center> | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | == References == | ||
+ | |||
+ | * [https://engineering.purdue.edu/~comerm/ M. Comer]. ECE 600. Class Lecture. [https://engineering.purdue.edu/~comerm/600 Random Variables and Signals]. Faculty of Electrical Engineering, Purdue University. Fall 2013. | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ==[[Talk:ECE600_F13_Stochastic_Convergence_mhossain|Questions and comments]]== | ||
+ | |||
+ | If you have any questions, comments, etc. please post them on [[Talk:ECE600_F13_Stochastic_Convergence_mhossain|this page]] | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | [[ECE600_F13_notes_mhossain|Back to all ECE 600 notes]] |
Latest revision as of 11:13, 21 May 2014
The Comer Lectures on Random Variables and Signals
Topic 18: Stochastic Convergence
Contents
Stochastic Convergence
We will now consider infinite sequences of random variables. We will discuss what it means for such a sequence to converge. This will lead to some very important results: the laws of large numbers and the Central Limit Theorem.
Consider a sequence X$ _1 $,X$ _2 $,..., where each X$ _i $ is a random variable on (S,F,P). We will call this a random sequence (or a discrete-time random process).
Notation $ \qquad $ X$ _n $ may refer to either the sequence itself or to the nth element in the sequence. We may also use {X$ _n $} to denote the sequence, or X$ _n $, n ≥ 1.
The sequence X$ _n $ maps S to the set of all sequences of real numbers, so for a fixed S, X$ _1(\omega) $,X$ _2(\omega) $,... is a sequence of real numbers.
Before looking at convergence, recall the meaning of convergence or a sequence of real numbers.
Definition $ \qquad $ A sequence of real numbers x$ _1 $,x$ _2 $,... converges to a number x ∈ R if ∀$ \epsilon $ > 0, ∃ an n$ _{\epsilon} $ ∈ N such that
If there is such an x ∈ R, we say
or
For a random sequence X$ _n $, the issue of convergence is more complicated since X$ _n $ is a function of $ \omega $ ∈ S.
First look at a motivating example.
Example $ \qquad $ Let X$ _k $ = s + W$ _k $, where s ∈ R and W$ _k $ is a random sequence with E[W$ _k $] = 0 ∀k = 1,2,.... W$ _k $ can be viewed as a noise sequence if we want to know the value of s.
Let
Then, E[Y$ _n $] = s ∀n. But Y$ _n $ is a random variable, so we cannot expect Y$ _n $ = s ∀n. However, we intuitively expect Y$ _n $ to be a better estimate of s as n increases. Does Y$ _n $ → s as n → ∞ ? If so, in what sense?
Types of Convergence
Since X$ _n(\omega) $ is generally a different sequence for very $ \omega $ ∈ S, what does it mean for X$ _n $ to converge? We will discuss different ways in which X$ _n $ can converge.
Definition $ \qquad $ X$ _n $ converges everywhere if the sequence X$ _1(\omega) $,X$ _2(\omega) $,... converges to some value X$ (\omega) $ ∀$ \omega $ ∈ S. We also call this sure convergence or convergence surely.
Notation
Definition $ \qquad $ X$ _n $ converges almost everywhere or almost surely, if X$ _n(\omega) $ → X$ (\omega) $ for some A ∈ S with P(A) = 1. Also known as convergence with probability 1.
Notation
Definition $ \qquad $ X$ _n $ converges in mean square to X if
as
Notation
Definition X$ _n $ converges in probability to X if ∀$ \epsilon $ > 0
as
Note that P(|X$ _n $ - X| > $ \epsilon $) is a sequence of real numbers.
Notation
Definition $ \qquad $ X$ _n $ converges in distribution to X if
as
Note that for a fixed x ∈ R, F$ _n $(x) is a sequence of real numbers.
Notation
We will generally assume that if X$ _n $ converges in distribution, then
or
$ f_{X_n}(x)\rightarrow f_X(x) $
as
although this is not true in every instance, as seen in the next example.
Example $ \qquad $ Let X$ _n $ have pdf
Then
Now
as
$ n\rightarrow\infty $
where
However, f$ _{Xn} $ does not converge for x ∈ (0,1).
Cauchy Criterion for Convergence
We will now discuss a method for showing that X$ _n $ → X in the mean square sense for some random variables X without knowing what X is. Frist, consider a sequence of real numbers.
Definition If x$ _n $ is a sequence of real numbers and
as
then x$ _n $ converges iff it is a Cauchy sequence. This is known as the Cauchy criterion for convergence.
The Cauchy Criterion for mean square convergence of X$ _n $ $ \qquad $ It can be shown that a random sequence X$ _n $ converges in the mean square sense iff
as
It can be shown that the following relationships between types of convergences hold:
Some Important Random Sequences
First, we need the Chebyshev Inequality:
Let X be a random variable with mean $ \mu $ and variance $ \sigma^2 $. Then,
and
Proof $ \quad $ let
for a fixed ε > 0. Then,
and
Now consider $ \phi $(x) = g$ _2 $(x) - g$ _1 $(x)
We have that $ \phi $(x) ≥ 0 ∀x ∈ R. So,
The Laws of Large Numbers
Recall that the previous example where
where s ∈ R and E[W$ _k $] = 0. Let
Does Y$ _n $ converge to s? If so, in what sense? The laws of large numbers address this question.
Theorem $ \qquad $ The Weak Law of Large Numbers
Let X$ _n $ be a sequence of iid random variables with mean $ \mu $ and variance $ \sigma^2 $. Let
Y$ _n $ is the sample mean.
Then, for any $ \epsilon $ < 0,
as
i.e., Y$ _n $ → $ \mu $ in probability.
Proof
Using the Chebyshev Inequality, we have that
Theorem $ \qquad $ Strong Law of Large Numbers
Let X$ _n $ be a sequence of iid random variables with mean $ \mu $ and variance $ \sigma^2 $. Then
as
The proof is omitted as it is beyond the scope of this class. Nevertheless, note that the strong law of large numbers implies the weak law of large numbers.
So for example
if the W$ _k $'s are independent and have the same variance $ \sigma^2 $, then the sample mean Y$ _n $ converges to s with probability 1.
But the laws of large numbers tell us something more fundamental about our axiomatic-based probability measure. Consider a random variable X and a set A ⊂ B(R). Let X$ _1 $,X$ _2 $,... be an iid sequence of random variables with the same distribution as X. Now let
Then
The Y $ _k $'s are iid with with mean P(X ∈ A) and the relative frequency of the even {X ∈ A} is
The strong law pf large number tells us that
with probability 1 as
So with probability 1, the relative frequency approach to probability converges to the axiom-based P(X ∈ A).
The Central Limit Theorem
Let X$ _n $ be a sequence of iid random variables with finite mean $ \mu $ and variance $ \sigma^2 $. Then if
then,
as
where Z is N(0,1).
i.e.
Proof $ \qquad $ We will use the following Lemma
Lemma $ \qquad $ Let Z$ _n $ be a sequence of random variables with cdfs F$ _{Zn} $ and mgfs $ \Phi_{Zn} $(s), and let Z be a random variable with cdf F$ _Z $ and mgf $ \Phi_Z $. Then if $ \Phi_{Zn} $(s) → $ \Phi_Z $ ∀s ∈ C, then F$ _{Zn} $(z) → F$ _Z $(z) ∀z ∈ R.
So it is sufficient to show that
Consider
Now considering the case $ \mu=0 $, $ \sigma^2=1 $, we have
Let L(s) = log$ |phi_X $(s). Then, to show that
it is sufficient to show that
Will need
Now
For general $ \mu $, $ \sigma^2 $, use the result for $ \mu $ = 0, $ \sigma^2 $ = 1 with
References
- M. Comer. ECE 600. Class Lecture. Random Variables and Signals. Faculty of Electrical Engineering, Purdue University. Fall 2013.
Questions and comments
If you have any questions, comments, etc. please post them on this page