Using multiplicative notation, we could have written
![{\displaystyle P_{A}(B):={\frac {P(BA)}{P(A)}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/702ca2e8bc28bfe6fcaf8756a84b81db35644628)
instead.
This definition is intuitive, since the following lemmata are satisfied:
Lemma 3.2:
![{\displaystyle A\subseteq B\Rightarrow P_{A}(B)=1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e2df34f9d406aa631a349cd7da76d5fbaf888cb9)
Lemma 3.3:
![{\displaystyle P_{A}(B+C)=P_{A}(B)+P_{A}(C)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/5e673e93440d3275c8206dcc58c1a611a033a2b2)
Each lemma follows directly from the definition and the axioms holding for
(definition 2.1).
From these lemmata, we obtain that for each
,
satisfies the defining axioms of a probability space (definition 2.1).
With this definition, we have the following theorem:
Proof:
From the definition, we have
![{\displaystyle P_{A}(B)P(A)=P(AB)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fa1d3a59346a1c19d9851cd00ad63dd947961ae0)
for all
. Thus, as
is an algebra, we obtain by induction:
![{\displaystyle {\begin{aligned}P(A_{1}A_{2}\cdots A_{n})&=P((A_{1}A_{2}\cdots A_{n-1})A_{n})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P(A_{1}\cdots A_{n-1})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1}).\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/371f172f29d1fd55820a877cca43145577e82e23)
![{\displaystyle \Box }](https://wikimedia.org/api/rest_v1/media/math/render/svg/029b77f09ebeaf7528fc831fe57848be51f2240b)
Theorem 3.5 (Theorem of the total probability):
Let
be a probability space, and assume
![{\displaystyle \Omega =A_{1}+\cdots +A_{n}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/6788f08581491648f30211d435722cd1d52a00c8)
(note that by using the
-notation, we assume that the union is disjoint), where
are all contained within
. Then
.
Proof:
![{\displaystyle {\begin{aligned}\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)&=\sum _{j=1}^{n}P(A_{j}){\frac {P(A_{j}\cap B)}{P(A_{j})}}\\&=\sum _{j=1}^{n}P(A_{j}B)\\&=P\left(\sum _{j=1}^{n}A_{j}B\right)\\&=P\left(\left(\sum _{j=1}^{n}A_{j}\right)B\right)\\&=P(\Omega B)\\&=P(B),\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/68032afee053693cee0e33ff099a5ad99addb0a4)
where we used that the sets
are all disjoint, the distributive law of the algebra
and
.
Theorem 3.6 (Retarded Bayes' theorem):
Let
be a probability space and
. Then
.
Proof:
.![{\displaystyle \Box }](https://wikimedia.org/api/rest_v1/media/math/render/svg/029b77f09ebeaf7528fc831fe57848be51f2240b)
This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets
, already know
,
and
, and want to compute
. The situation is depicted in the following picture:
We know the ratio of the size of
to
, but what we actually want to know is how
compares to
. Hence, we change the 'comparitant' by multiplying with
, the old reference magnitude, and dividing by
, the new reference magnitude.
Theorem 3.7 (Bayes' theorem):
Let
be a probability space, and assume
,
where
are all in
. Then for all
.
Proof:
From the basic version of the theorem, we obtain
.
Using the formula of total probability, we obtain
.![{\displaystyle \Box }](https://wikimedia.org/api/rest_v1/media/math/render/svg/029b77f09ebeaf7528fc831fe57848be51f2240b)