# Probability Theory/Conditional probability

## Basics and multiplication formula

Definition 3.1 (Conditional probability):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ be a probability space, and let ${\displaystyle A\in {\mathcal {F}}}$ be fixed, such that ${\displaystyle P(A)>0}$. If ${\displaystyle B\in {\mathcal {F}}}$ is another set, then the conditional probability of ${\displaystyle B}$ where ${\displaystyle A}$ already has occurred (or occurs with certainty) is defined as

${\displaystyle P_{A}(B):={\frac {P(B\cap A)}{P(A)}}}$.

Using multiplicative notation, we could have written

${\displaystyle P_{A}(B):={\frac {P(BA)}{P(A)}}}$

This definition is intuitive, since the following lemmata are satisfied:

Lemma 3.2:

${\displaystyle A\subseteq B\Rightarrow P_{A}(B)=1}$

Lemma 3.3:

${\displaystyle P_{A}(B+C)=P_{A}(B)+P_{A}(C)}$

Each lemma follows directly from the definition and the axioms holding for ${\displaystyle P}$ (definition 2.1).

From these lemmata, we obtain that for each ${\displaystyle A\in {\mathcal {F}}}$, ${\displaystyle (\Omega ,{\mathcal {F}},P_{A})}$ satisfies the defining axioms of a probability space (definition 2.1).

With this definition, we have the following theorem:

Theorem 3.4 (Multiplication formula):

${\displaystyle P(A_{1}A_{2}\cdots A_{n})=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1})}$,

where ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ is a probability space and ${\displaystyle A_{1},\ldots ,A_{n}}$ are all in ${\displaystyle {\mathcal {F}}}$.

Proof:

From the definition, we have

${\displaystyle P_{A}(B)P(A)=P(AB)}$

for all ${\displaystyle A,B\in {\mathcal {F}}}$. Thus, as ${\displaystyle {\mathcal {F}}}$ is an algebra, we obtain by induction:

{\displaystyle {\begin{aligned}P(A_{1}A_{2}\cdots A_{n})&=P((A_{1}A_{2}\cdots A_{n-1})A_{n})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P(A_{1}\cdots A_{n-1})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1}).\end{aligned}}}${\displaystyle \Box }$

## Bayes' theorem

Theorem 3.5 (Theorem of the total probability):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ be a probability space, and assume

${\displaystyle \Omega =A_{1}+\cdots +A_{n}}$

(note that by using the ${\displaystyle +}$-notation, we assume that the union is disjoint), where ${\displaystyle A_{1},\ldots ,A_{n}}$ are all contained within ${\displaystyle {\mathcal {F}}}$. Then

${\displaystyle \forall B\in {\mathcal {F}}:P(B)=\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)}$.

Proof:

{\displaystyle {\begin{aligned}\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)&=\sum _{j=1}^{n}P(A_{j}){\frac {P(A_{j}\cap B)}{P(A_{j})}}\\&=\sum _{j=1}^{n}P(A_{j}B)\\&=P\left(\sum _{j=1}^{n}A_{j}B\right)\\&=P\left(\left(\sum _{j=1}^{n}A_{j}\right)B\right)\\&=P(\Omega B)\\&=P(B),\end{aligned}}}

where we used that the sets ${\displaystyle A_{1}B,\ldots ,A_{n}B}$ are all disjoint, the distributive law of the algebra ${\displaystyle {\mathcal {F}}}$ and ${\displaystyle \Omega \cap B=B}$.${\displaystyle \Box }$

Theorem 3.6 (Retarded Bayes' theorem):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ be a probability space and ${\displaystyle A,B\in {\mathcal {F}}}$. Then

${\displaystyle P_{B}(A)={\frac {P(A)P_{A}(B)}{P(B)}}}$.

Proof:

${\displaystyle {\frac {P(A)P_{A}(B)}{P(B)}}={\frac {P(A){\frac {P(A\cap B)}{P(A)}}}{P(B)}}=P_{B}(A)}$.${\displaystyle \Box }$

This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets ${\displaystyle A,B\in {\mathcal {F}}}$, already know ${\displaystyle P(A)}$, ${\displaystyle P(B)}$ and ${\displaystyle P_{A}(B)}$, and want to compute ${\displaystyle P_{B}(A)}$. The situation is depicted in the following picture:

We know the ratio of the size of ${\displaystyle A\cap B}$ to ${\displaystyle A}$, but what we actually want to know is how ${\displaystyle A\cap B}$ compares to ${\displaystyle B}$. Hence, we change the 'comparitant' by multiplying with ${\displaystyle P(A)}$, the old reference magnitude, and dividing by ${\displaystyle P(B)}$, the new reference magnitude.

Theorem 3.7 (Bayes' theorem):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$ be a probability space, and assume

${\displaystyle \Omega =A_{1}+\cdots +A_{n}}$,

where ${\displaystyle A_{1},\ldots ,A_{n}}$ are all in ${\displaystyle {\mathcal {F}}}$. Then for all ${\displaystyle B\in {\mathcal {F}}}$

${\displaystyle \forall j\in \{1,\ldots ,n\}:P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}}$.

Proof:

From the basic version of the theorem, we obtain

${\displaystyle P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{P(B)}}}$.

Using the formula of total probability, we obtain

${\displaystyle P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}}$.${\displaystyle \Box }$