Difference between revisions of "EE662Sp10OptimalPrediction" - Rhea

Revision as of 12:42, 27 January 2010

I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.

Consider a random experiment with 2 outcomes, 0 and 1.

Let $\displaystyle E_0$ be the event that outcome 0 occurs and $\displaystyle E_1$ be the event that outcome 1 occurs.

Let $\displaystyle Pr(E_0) = p$ and $\displaystyle Pr(E_1) = 1-p$ , where $\displaystyle p$ is some fixed but arbitrary probability.

Assume, without loss of generality, that $\displaystyle p \ge \frac{1}{2}$ (relabelling the outcomes if necessary).

Consider a joint, independent random experiment intended to predict the outcome of the first.

Let $\displaystyle F_0$ be the event that outcome 0 is predicted and $\displaystyle F_1$ be the event that outcome 1 is predicted.

Let $\displaystyle Pr(F_0) = q$ and $\displaystyle Pr(F_1) = 1-q$

The probability of error is

$\displaystyle P_{err} = Pr((E_0 \cap F_1) \cup (E_1 \cap F_0)) = Pr(E_0 \cap F_1) + Pr(E_1 \cap F_0)$ .

By independence,

$\displaystyle Pr(E_0 \cap F_1) = Pr(E_0) \cdot Pr(F_1)$

$\displaystyle Pr(E_1 \cap F_0) = Pr(E_1) \cdot Pr(F_0)$ .

So,

$\displaystyle P_{err} = Pr(E_0) \cdot Pr(F_1) + Pr(E_1) \cdot Pr(F_0) = p(1-q) + (1-p)q$ .

We would like to choose $\displaystyle q\in[0,1]$ to minimize $\displaystyle P_{err}$ . Since $\displaystyle P_{err}$ is linear in $\displaystyle q$ , the extrema are at the endpoints. Hence, evaluating at $\displaystyle q=0$ and $\displaystyle q=1$ , the minimal $\displaystyle P_{err}$ is $\displaystyle 1-p$ at $\displaystyle q=1$ . Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.

Futhermore, on the interval $\displaystyle p\in[\frac{1}{2}, 1]$ , $\displaystyle P_{err}$ is a strictly decreasing function. That is, the closer $\displaystyle p$ is to $\displaystyle \frac{1}{2}$ , the worse it can be predicted (the higher $\displaystyle P_{err}$ is), and the farther $\displaystyle p$ is from $\displaystyle \frac{1}{2}$ the better it can be predicted. This is consistent with the information theoretic description of entropy (which has its maximum at $\displaystyle p=\frac{1}{2}$ ) as the "average uncertainty in the outcome". Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.

As a concrete example, consider two approaches for predicting an experiment with $\displaystyle p=.8$ (i.e. $\displaystyle E_0$ occurs with probability .8 and $\displaystyle E_1$ occurs with probability .2). In the first approach we always predict $\displaystyle E_0$ (hence $\displaystyle q = Pr(F_0) = 1, Pr(F_1) = 0$ ). With this approach we have $\displaystyle Pr(\{(E_0, F_0)\}) = .8$ , $\displaystyle Pr(\{(E_0, F_1)\}) = 0$ , $\displaystyle Pr(\{(E_1, F_0)\}) = .2$ , $\displaystyle Pr(\{(E_1, F_1)\}) = 0$ . So $\displaystyle P_{err} = .2$ .

In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2). With this approach we have $\displaystyle Pr(\{(E_0, F_0)\}) = .64$ , $\displaystyle Pr(\{(E_0, F_1)\}) = .16$ , $\displaystyle Pr(\{(E_1, F_0)\}) = .16$ , $\displaystyle Pr(\{(E_1, F_1)\}) = .04$ . So $\displaystyle P_{err} = .32$ , substantially worse.

--Jvaught 19:59, 26 January 2010 (UTC)

@@ Line 1: / Line 1: @@
-I wanted to understand why picking the most likely is the best you
+I wanted to understand why picking the most likely is the best you can do (better than choosing randomly from an identical distribution) so I worked it out as follows.
-can do (better than choosing randomly from an identical distribution)
-so I worked it out as follows.
 Consider a random experiment with 2 outcomes, 0 and 1.
-Let <math>E_0</math> be the event that outcome 0 occurs and <math>E_1</math> be the event that outcome 1 occurs.
+Let <math>\displaystyle E_0</math> be the event that outcome 0 occurs and <math>\displaystyle E_1</math> be the event that outcome 1 occurs.
-Let <math>Pr(E_0) = p</math> and <math>Pr(E_1) = 1-p</math>, where <math>p</math> is some fixed but arbitrary probability.
+Let <math>\displaystyle Pr(E_0) = p</math> and <math>\displaystyle Pr(E_1) = 1-p</math>, where <math>\displaystyle p</math> is some fixed but arbitrary probability.
-Assume, without loss of generality, that <math>p \ge 0.5</math> (relabelling the outcomes if necessary).
+Assume, without loss of generality, that <math>\displaystyle p \ge \frac{1}{2}</math> (relabelling the outcomes if necessary).
 Consider a joint, independent random experiment intended to predict the outcome of the first.
-Let <math>F_0</math> be the event that outcome 0 is predicted and <math>F_1</math> be the event that outcome 1 is predicted.
+Let <math>\displaystyle F_0</math> be the event that outcome 0 is predicted and <math>\displaystyle F_1</math> be the event that outcome 1 is predicted.
-Let <math>Pr(F_0) = q</math> and <math>Pr(F_1) = 1-q</math>
+Let <math>\displaystyle Pr(F_0) = q</math> and <math>\displaystyle Pr(F_1) = 1-q</math>
 The probability of error is
-<math>P_{err} = Pr(\{(E_0, F_1),(E_1, F_0)\}) = Pr(\{(E_0, F_1)\}) + Pr(\{(E_1, F_0)\})</math>.
+<math>\displaystyle P_{err} = Pr((E_0 \cap F_1) \cup (E_1 \cap F_0)) = Pr(E_0 \cap F_1) + Pr(E_1 \cap F_0)</math>.
 By independence,
-<math>Pr(\{(E_0, F_1)\}) = Pr(\{E_0\})  Pr(\{F_1\})</math>
-and
-<math>Pr(\{(E_1, F_0)\}) = Pr(\{E_1\})  Pr(\{F_0\})</math>.
-So,
+<math>\displaystyle Pr(E_0 \cap F_1) = Pr(E_0) \cdot Pr(F_1)</math>
-<math>P_{err} = p(1-q) + (1-p)q = p - 2pq + q = (1-2p)q + p</math>.
+<math>\displaystyle Pr(E_1 \cap F_0) = Pr(E_1) \cdot Pr(F_0)</math>.
+So,
-We would like to choose <math>q\in[0,1]</math> to minimize
+<math>\displaystyle P_{err} = Pr(E_0) \cdot Pr(F_1) + Pr(E_1) \cdot Pr(F_0) = p(1-q) + (1-p)q</math>.
-<math>P_{err}</math>.  Since <math>P_{err}</math> is linear
-in <math>q</math>, the extrema are at the endpoints.  Hence,
-evaluating at <math>q=0</math> and <math>q=1</math>, the
-minimal <math>P_{err}</math> is <math>1-p</math> at
-<math>q=1</math>.  Thus the optimal strategy for predicting
-the outcome of the first experiment is to always (with
-probability 1) predict the more likely outcome.
-Futhermore, on the interval <math>p\in[\frac{1}{2}, 1]</math>,
+We would like to choose <math>\displaystyle q\in[0,1]</math> to minimize <math>\displaystyle P_{err}</math>.  Since <math>\displaystyle P_{err}</math> is linear in <math>\displaystyle q</math>, the extrema are at the endpoints.  Hence, evaluating at <math>\displaystyle q=0</math> and <math>\displaystyle q=1</math>, the minimal <math>\displaystyle P_{err}</math> is <math>\displaystyle 1-p</math> at <math>\displaystyle q=1</math>.  Thus the optimal strategy for predicting the outcome of the first experiment is to always (with probability 1) predict the more likely outcome.
-<math>P_{err}</math> is a strictly decreasing function.  That is,
-the closer <math>p</math> is to <math>\frac{1}{2}</math>, the
-worse it can be predicted (the higher <math>P_{err}</math> is),
-and the farther <math>p</math> is from <math>\frac{1}{2}</math>
-the better it can be predicted.  This is consistent with the
-information theoretic description of entropy (which has its
-maximum at <math>p=\frac{1}{2}</math>) as the "average uncertainty
-in the outcome".  Clearly the less uncertain the outcome is,
-the better we should expect to be able to predict it.
+Futhermore, on the interval <math>\displaystyle p\in[\frac{1}{2}, 1]</math>, <math>\displaystyle P_{err}</math> is a strictly decreasing function.  That is, the closer <math>\displaystyle p</math> is to <math>\displaystyle \frac{1}{2}</math>, the worse it can be predicted (the higher <math>\displaystyle P_{err}</math> is), and the farther <math>\displaystyle p</math> is from <math>\displaystyle \frac{1}{2}</math> the better it can be predicted.  This is consistent with the information theoretic description of entropy (which has its maximum at <math>\displaystyle p=\frac{1}{2}</math>) as the "average uncertainty in the outcome".  Clearly the less uncertain the outcome is, the better we should expect to be able to predict it.
-As a concrete example, consider two approaches for predicting
+As a concrete example, consider two approaches for predicting an experiment with <math>\displaystyle p=.8</math> (i.e. <math>\displaystyle E_0</math> occurs with probability .8 and <math>\displaystyle E_1</math> occurs with probability .2).  In the first approach we always predict <math>\displaystyle E_0</math> (hence <math>\displaystyle q = Pr(F_0) = 1, Pr(F_1) = 0</math>).  With this approach we have <math>\displaystyle Pr(\{(E_0, F_0)\}) = .8</math>, <math>\displaystyle Pr(\{(E_0, F_1)\}) = 0</math>, <math>\displaystyle Pr(\{(E_1, F_0)\}) = .2</math>, <math>\displaystyle Pr(\{(E_1, F_1)\}) = 0</math>.  So <math>\displaystyle P_{err} = .2</math>.
-an experiment with <math>p=.8</math> (i.e. <math>E_0</math> occurs
-with probability .8 and <math>E_1</math> occurs with probability
-.2).  In the first approach we always predict <math>E_0</math>
-(hence <math>q = Pr(F_0) = 1, Pr(F_1) = 0</math>).  With this
-approach we have <math>Pr(\{(E_0, F_0)\}) = .8</math>,
-<math>Pr(\{(E_0, F_1)\}) = 0</math>, <math>Pr(\{(E_1, F_0)\}) = .2</math>,
-<math>Pr(\{(E_1, F_1)\}) = 0</math>.  So <math>P_{err} = .2</math>.
-In the second approach we predict randomly according to the
+In the second approach we predict randomly according to the distribution of the first experiment (i.e. q = Pr(F_0) = .8, Pr(F_1) = .2).  With this approach we have <math>\displaystyle Pr(\{(E_0, F_0)\}) = .64</math>, <math>\displaystyle Pr(\{(E_0, F_1)\}) = .16</math>, <math>\displaystyle Pr(\{(E_1, F_0)\}) = .16</math>, <math>\displaystyle Pr(\{(E_1, F_1)\}) = .04</math>. So <math>\displaystyle P_{err} = .32</math>, substantially worse.
-distribution of the first experiment (i.e. q = Pr(F_0) = .8,
-Pr(F_1) = .2).  With this approach we have
-<math>Pr(\{(E_0, F_0)\}) = .64</math>, <math>Pr(\{(E_0, F_1)\}) = .16</math>,
-<math>Pr(\{(E_1, F_0)\}) = .16</math>, <math>Pr(\{(E_1, F_1)\}) = .04</math>.
-So <math>P_{err} = .32</math>, substantially worse.
 --[[User:Jvaught|Jvaught]] 19:59, 26 January 2010 (UTC)

Difference between revisions of "EE662Sp10OptimalPrediction" - Rhea

Revision as of 12:42, 27 January 2010

Alumni Liaison