Probability/Probability Spaces

From Wikibooks, open books for an open world
Jump to navigation Jump to search


Concept[edit | edit source]

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

Probability space[edit | edit source]

Before defining probability space, we define several terms used in its definition.

Definition. (Sample space) The sample space, denoted by , is the non-empty set whose elements are all possible outcomes of an experiment.

Remark.

  • the sample space is often not unique, since there are often multiple ways to define the possible outcomes of an experiment, possibly because of the difference in expression [1]
  • alternative notations of sample space include and ( stands for 'universal set')
  • an outcome from the experiment is commonly denoted by (small letter of , omega)

Example. A sample space of the numbers coming up from rolling a six-faced dice is .

Definition. (Event) An event is a subset of the sample space.

Remark.

  • it follows that the event space (it is power set of the sample space)
  • event consisting a single outcome (which is a singleton) is sometimes referred as simple event, and event consisting more than one outcomes is sometimes referred as compound event
  • an event is said to have happened or occurred if the outcome of the experiment is an element of the event

Example. are events from rolling a six-faced dice, while is not.

Definition. (Probability space) A probability space is a mathematical triplet consisting of the sample space , a set of events (or event space) , and a probability function .

Remark.

  • there are multiple ways to define the probability functions, as we will see in the following sections, and among those definitions, the axiomatic definition is the most used, and general
  • the probability function is sometimes denoted by or instead
  • the notation is mainly used in this book to distinguish the probability function from other functions named or
  • a probability space is arbitrary, in the sense that its author ultimately defines which elements , , and will contain
  • the probability function may present a model for a particular class of real-world situations

Terminologies[edit | edit source]

Terminologies of set from set theory also apply to event, since event is essentially a set. Apart from those terminologies, we also have the following extra terminologies for event.

Definition. (Exhaustive) Events are exhaustive if .

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the events and are exhaustive, while the events and are not exhaustive.

Definition. (Partition) A group of events is a partition of if the events are both disjoint and exhaustive.

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the group of events and is a partition, while the group of events and is not a partition, since these events are not disjoint.

Probability definition[edit | edit source]

The remaining undefined item in the probability space is the probability function , and we will give various definitions of it, in which the combinatorial (or classical), and axiomatic definitions are important.

Definition. (Subjective probability) The probability of an event is a measure of the chance with which we can expect the event to occur. We assign a number between and inclusively to the probability of an event. A probability of means that we are certain the event will occur, and a probability of means that we are certain the event will not occur.

Example. Amy and Bob access their probabilities of winning the top prize from a lucky draw using the subjective probability approach.

  • Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize
  • Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize

Remark.

  • this illustrates a major problem of subjective probability, namely the probability assigned to an event is often not unique, due to different opinions from different people

Definition. (Combinatorial probability) Assume all outcomes in the sample space are equally likely. Then, the (combinatorial) probability of an event (say ) in the sample space is

Remark.

  • it is also called classical probability
  • if the outcomes are not equally likely, we cannot apply this definition
  • by principle of indifference (or insufficient reason), unless there exists evidence showing that the outcomes are not equally likely [2], we should assume that the outcomes are equally likely
  • when the sample space contains infinitely many outcomes, the combinatorial probability is undefined

Example. The probability of getting the number 1 coming up from rolling a fair red six-faced dice and a fair blue six-faced dice is .

Proof. The number of pair of numbers coming up for the two dices is . Since the dice is fair, the 36 outcomes are equally likely, and so we can apply combinatorial probability here.

Clipboard

Exercise.

Suppose the blue dice is colored red. Calculate the probability again.



Example. (Capture-mark-recapture) We are fishing in a lake, containing fishes. First, we catch fishes from the lake (capture), and gave them each a marker (mark). Then, we catch fishes from the lake again (recapture), and catch (and also ) fishes this time. The probability that there is marked fishes in the fishes is .

Proof. We order the fishes in the lake notionally (e.g. by assigning them different number one by one), so that they are now distinguishable (notionally), then, we have:

  • : the number of outcomes of catching fishes from fishes
  • : the number of outcomes of catching marked fishes from marked fishes in the recapture process
  • : the number of outcomes of catching unmarked fishes from unmarked fishes in the recapture process (this ensure that we only catch marked fishes, by ensuring that the remaining caught fishes do not contain any marked fish)


Clipboard

Exercise. There are 9 balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

1 Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

none of the above

2 Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box.

none of the above

3 orange balls are added to the box such that the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box is now . Calculate .

4
8
16
there exist such , and its value is none of the above
there does not exist such

4 Select the correct (in numerical value sense) expression(s) of the probability that red balls are drawn and blue balls are drawn if balls are drawn from the box (, and are of values such that all terms in the following are defined).


Definition. (Frequentist probability) The probability of an event or outcome is the long-term proportion of times the event would occur if the experiment was repeated independently many times. That is, letting be the no. of times that event occurs from repetitions of experiment, then the probability of is

Example. Suppose we throw a coin 1 million times (i.e. 1000000 times). The number of head coming up is 700102, the number of tail coming up is 299896, and the number of times that the coin lands on edge is 2.

Then, the probability that the head coming up is close to .

After that, we may infer that the coin is unfair.

Definition. (Axiomatic probability) A probability is a set function defined on the event space (). It assigns a real value to each event , with the following probability axioms satisified:

(P1) for each event , (nonnegativity)
(P2) (unitarity)
(P3) for each (countable) infinite sequence of mutually exclusive (or disjoint) events , (countable additivity)

Remark.

  • mutually exclusive events means that the intersection of each two of the events is empty set

Example. Based on the probability axioms, the probability of an event is impossible to be -0.1.

Example. (Combinatorial probability is probability) Combinatorial probability is a probability since it satisfies all three probability axioms.

Proof.

(P1) it follows from observing that the no. of outcomes is nonnegative
(P2) it follows from observing that the no. of outcomes in the event (which is a subset of sample space) cannot be larger than the no. of outcomes in the sample space
(P3) it follows from observing that the no. of outcomes in union of (infinite) disjoint sets is the same as the sum of no. of outcomes in each of the (infinite) disjoint sets (possibly through the Venn diagram, non-rigorously)


With these three axioms only, we can prove many well-known properties of probability.

Properties of probability[edit | edit source]

Basic properties of probability[edit | edit source]

Proposition. (Probability of empty set) .

Proof. Let for each positive integer . are mutually exclusive, since they are all empty sets, and the intersection of each two of them is also empty set. Also, . So,

By P1, . It follows that from these two inequalities that .

Proposition. (Extended P3) The property of probability in the third axiom of probability (P3) is also valid for a finite sequence of events.

Proof. For each positive integer , suppose that are disjoint events, and append to these the infinite sequence of events . By P3,

since .

Proposition. (Simplified law of total probability) For each event , .

Proof.

[3]

Illustration of simplified law of total probability:

|---------|        
|  B\A    | <----- B
|    |----|-----|
|    |BnA |     |
|----|----|     | <---- A
     |----------|

Proposition. (Simplified inclusion-exclusion principle) For each event and , .

Proof. Since events and are disjoint, by extended P3,

since .

Illustration of simplified inclusion-exclusion principle:

|---------|        
|         | <----- B
| II |----|-----|
|    |AnB |     |
|----|----| I   | <---- A
     |----------|

Proposition. (Complement rule) For each event ,

Proof.

Illustration of complement rule:

|---------------|
|               |
|      E^c      | <--- Omega (Pr(Omega)=1)
|    |---|      |
|    | E |      |
|    |---|      |
|---------------|

Proposition. (Numeric bound for probability) For each event , .

Proof. By P1, , and . So,

Proposition. (Monotonicity) If , then .

Proof. By simplified law of total probability,

Illustration of monotonicity: (when )

|---------------|
|               |
|               | <--- B 
|    |---|      |
|    | A |      |
|    |---|      |
|---------------|

Example. The probability of winning the champion in a competition is less than or equal to that of entering the final of the competition, by monotonicity.

Proof. Let and the event of winning the champion in the competition, and entering the final of the competition respectively. Then, , since (when we win the champion, then we must enter the final), and so .


Clipboard

Exercise.

Select all correct statement(s). All following capital letters are events.

If , then .
.
if and .
if .


More advanced properties of probability[edit | edit source]

Theorem. (Inclusion-exclusion principle)

Illustration of inclusion-exclusion principle when

For each event ,

Proof. We can prove this by induction.

Recall the simplified inclusion-exclusion principle, which is essentially the inclusion-exclusion principle when . So, we know that the inclusion-exclusion principle is true for , and it remains to prove the case with larger .

The idea of the induction is illustrated as follows: by simplified inclusion-exclusion principle,

Remark.

  • we can write the inclusion-exclusion principle more compactly as follows:

  • an alternative and more elegant proof is provided in the chapter about properties of distributions
  • for the intersections of event, each possible distinct combination is involved

Example. When , for each event and ,

Clipboard

Exercise.

Select the correct expression(s) for for each event and .



Example. (Application of inclusion-exclusion principle) Among 160 students,

  • 40% has a major in mathematics
  • 55% has a major in statistics
  • 30% has a major in accounting
  • 20% has a major in statistics and accounting
  • 15% has a major in accounting and mathematics
  • 20% has a major in mathematics and statistics
  • 10% has a major in mathematics, statistics and accounting

The number of students that do not have any of these majors is

Proof. We can regard the percentage as combinatorial probability, since their definitions match (both are about proportion). Let be the event that a student among them has a major in mathematics, statistics and accounting respectively. Then,

Thus,

Alternatively, we can consider the following Venn diagram:

|-------------| <--------- A
|             |
|        |----|----|
|        |    |    |
| 0.05   |0.05|0.15| <---- M
|        |    |    |
|--------|----|----|------|
|        |0.1 |0.1 |      | 
| 0.1    |    |    | 0.25 | <---- S
|        |----|----|      |
|-------------|-----------|

We can from this diagram that , and then we can calculate the desired no.

The steps for constructing the above Venn diagram: [4]

Clipboard

Exercise.

1 Calculate the percentage of students that have at least two of those three majors.

10%
15%
20%
25%
40%

2 Calculate the percentage of students that have one and only one major.

30%
35%
40%
45%
50%



Remark.

  • this example illustrates that we can apply inclusion-exclusion principle to the no. of outcomes in events
  • to be more precise, we can replace with in the inclusion-exclusion principle, for which is the no. of outcomes in the event

Lemma. For each event ,

Proof.

Proposition. (Boole's inequality) For each event , .

Proof. First, by inclusion-exclusion principle, for each event and , .

So,

Using the lemma,


  1. e.g. the sample space of throwing a dice may include the six numbers, or may only include two outcomes: odd number and even number
  2. e.g. it is given that a coin is biased, such that it is more likely that head comes up
  3. ext. stands for 'extended'
  4. Given:
    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |        |    |    | <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |    |      | 
    |        |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: by observing the Venn digram and using the given information

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |        |0.05|    | <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |    |      | 
    |        |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: by observing the Venn digram and using the given information

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |        |0.05|    | <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |0.1 |      | 
    |        |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: by observing the Venn diagram and using the given information

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |        |0.05|0.15| <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |0.1 |      | 
    |        |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: using the given information

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |        |0.05|0.15| <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |0.1 |      | 
    | 0.1    |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: using the given information and observing the Venn digram

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |0.05    |0.05|0.15| <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |0.1 |      | 
    | 0.1    |    |    |      | <---- S
    |        |----|----|      |
    |-------------|-----------|
    

    Calculate: using the given information and observing the Venn digram

    |-------------| <--------- A
    |             |
    |        |----|----|
    |        |    |    |
    |0.05    |0.05|0.15| <---- M
    |        |    |    |
    |--------|----|----|------|
    |        |0.1 |0.1 |      | 
    | 0.1    |    |    | 0.25 | <---- S
    |        |----|----|      |
    |-------------|-----------|