# Probability/Probability Spaces

## Concept

Although we came up with a basic definition of probability in the previous chapter, we will now proceed to develop a more axiomatic theory. Our theory will avoid the ambiguities of probability, and allow for a simpler mathematical formalism. We shall proceed by developing the concept of probability space, which will allow us to harness many theorems in mathematical analysis.

## Formal Theory

### Set of Outcomes

The set of all possible outcomes is called the sample space, denoted by Ω. For every problem, you must pick an appropriate sample space. This is important because we can´t make any conclusions about the probability of an event if we don´t know the exact size of the sample space. In a coin toss the states could be “Heads” and “Tails”. For a die there could be one state for each side of the die. We have a probability function that specifies the probability of each state. Events are sets of states. In the die example an event could be rolling an even number.

### Probability Space Definition

A Probability Space consists of (Ω,S,P) where $\Omega$ is a non-empty set, called the sample space, its elements are called the outcomes, $S \subset \mbox{Power}(\Omega)$, containing the events, and P is a function $S \rightarrow \R$, called probability, satisfying the following axioms

1. S is such that combining events, even an infinite number, will result in an event, i.e. stay within S (formally S should be a σ-algebra);
2. For all $E\in S$, $0\le P(E)\le 1$ This states that for every event E, the probability of E occurring is between 0 and 1 (inclusive).
3. $P(\Omega)=1$ This states that the probability all the possible outcomes in the sample space is 1. (P is a normed measure.)
4. If $\{ E_1,E_2, \ldots \}$ is countable and $i\ne j\ \Rightarrow E_i \cap E_j = \empty$, then $P(\bigcup E_i)=\sum P(E_i)$. This states that if you have a group of events (each one denoted by E and a subscript), you can get the probability that some event in the group will occur by summing the individual probabilities of each event. This holds if and only if the events are disjoint.

### Explanation

Ω is called the sample space, and is a set of all the possible outcomes. Outcomes are all the possibilities of what can occur, where only one occurs. S is the set of events. Events are sets of outcomes, and they occur when any of their outcomes occur. For example rolling an even number might be an event, but it will consist of the outcomes 2,4, and 6. The probability function gives a number for each event, and the probability that something will occur is 1.

E.g, when tossing a single coin Ω is {H,T} and possible events are {}, {H}, {T}, and {H,T}. Intuitively, the probability of each of these sets is the chance that one of the events in the set will happen; P({H}) is the chance of tossing a head, P({H,T}) = 1 is the chance of the coin landing either heads or tails, P{} = 0 is the probability of the coin landing neither heads nor tails, etc.

### Other Definitions

Mutually exclusive
two or more events that can NOT occur at the same time; have no outcomes in common. Events are mutually exclusive if they cannot both occur simultaneously. Events are said to be mutually exclusive if the occurence of any one event automatically implies the non-occurence of the remaining n-1 events. Mutually exclusive events have the property in which Pr(A ∩ B) = 0. Also, when A and B are mutually exclusive events, P(A or B) = P(A) + P(B). In short, this implies that at most, one of the events may occur. In terms of statistics, the definition of mutually exclusive is: A property of a set of categories such that an individual or object is included in only one category. The occurrence of one event means that none of the other events can occur at the same time.
(Collectively) Exhaustive
events are said to be collectively exhaustive, which means that at least one of the events must occur. A set is jointly or exhaustive if at least one of the events must occur. Another way to describe collectively exhaustive events, is that their union must cover all the events within the entire sample space. For example, events A and B are said to be collectively exhaustive if where S is the sample space.
Test of Independence
If for two events A and B, p(A ∩ B) is not equal to p(A)p(B), then A and B are said to be associated or dependent. If p(A ∩ B) > p(A)p(B), so that p(A|B) > p(A) and p(B|A) > p(B), then the events are said to be positively associated (answer to question #2 above). If, however, p(A ∩ B) < p(A)p(B), so that p(A|B) <p(A) and p(B|A) < p(B), one says that the events are negatively associated.
Simple Random Sampling
In a simple random sample, one person must take a random sample from a population, and not have any order in which one chooses the specific individual. In statistics, a simple random sample is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. Simple random sampling can be done with or without replacement, though it is typically done without, i.e., one deliberately avoids choosing any member of the population more than once. When the sample is drawn with replacement, the same individual can be chosen more than once. When the sample is drawn without replacement, the same individual can be chosen no more than once in a given sample. Therefore, random sampling of one individual at a time means that every possible individual in the large group has an equal probability of being drawn.
Independence of random variables
If X is a real-valued random variable and a is a number then the event {X ≤ a} is the set of outcomes that correspond to X being less than or equal to a. Since these are sets of outcomes that have probabilities, it makes sense to refer to events of this sort being independent of other events of this sort.

Why are the p and q Bernoulli trial probabilities multiplied together in the binomial formula? The probability of an event can be expressed as a binomial probability if its outcomes can be broken down into two probabilities p and q, where p and q are complementary (i.e. p + q = 1). Binomial probability typically deals with the probability of several successive decisions, each of which has two possible outcomes. The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. Bernoulli process is a discrete-time stochastic process consisting of a sequence of independent random variables taking values over two symbols. Prosaically, a Bernoulli process is coin flipping several times, possibly with an unfair coin. A variable in such a sequence may be called a Bernoulli variable. In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials. Independence of Bernoulli trials implies memorylessness property: past trials do not provide any information regarding future outcomes. From any given time, future trials is also a Bernoulli process independent of the past (fresh-start property). a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. Two events A and B are independent if and only if Pr(A ∩ B) = Pr(A)Pr(B). Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and B occur. This is called the multiplication rule for independent events.

Universal Set
all the elements for any specific discussion, and is symbolized by the symbol U.
Example: U = {A,E,I,O,U}
Intersection
the elements 2 or more sets and is denoted by the symbol, ∩.
Union
elements in two or more sets and is denoted by the symbol, ∪.
Complement
all the elements in the universal set that not the original set and is denoted by the symbol, ′. Sometimes represented also by the symbol, ~, meaning "not" (i.e. p(AU~B) denotes "A union NOT B".
Example:
U = {1,2,3,4,5,6,7,8,9,0}; A= {1,2,3,}; B = {2,3,4,5,6}
A ∩ = {2,3,}
A ∪ = {1,2,3,4,5,6}
A′ = {4,5,6,7,8,9,0} B′={1,7,8,9,0}
Empty or Null Set
a set that contains no elements and are denoted by the symbols { }, ∅. This means that the event probability in question is impossible and thus, cannot occur.

### Further Concepts in Set Theory

Union and intersection: We can indicate the union and intersection of two sets with the symbols ∪ ∩, respectively. If two sets, then ∪ the set “”―the set that consists of all the elements that are either in set in set . Similarly, ∩ read “”, which is the set of all elements that are in both .

Set difference. We can “subtract” one set from another using the symbol “－”. If sets, then － the set consisting of all the elements that are members of are not members of . This is a little different from the subtraction you may be used to in arithmetic.

Whole numbers: 1. Non-negative integers (aka real numbers) – {0, 1, 2, 3...} 2. Positive integers – {1, 2, 3, 4....} 3. All possible integers – {-3, -2, -1, 0, 1, 2, 3...}

AND --> 2 or more things/events happen. Therefore, we MULTIPLY (product) probabilities together -- INTERSECTION OR --> 1 of the 2 events/things occur. Therefore we ADD (sum) probabilities together -- UNION

Events that are NOT mutually exclusive means we could draw them both (for example, we want to draw a 2 OR an 'M'...these events are NOT mutually exclusive, because they cannot occcur together at the same time).

Events that are NOT mutually exclusive, but rather are independent, we must subtract: 1-p(A∩B) from p(A) + p(B). So that, p(A) + p(B) -1 p(A∩B).

Probabilities for independent events often involve exponents, and probabilities for dependent events (conditional probability) often involve factorials.

### Permutations and Combinations

Permutations
arrangement of objects without repetition where order is important. Permutations using all the objects: n objects, arranged into group size of n without repetition, and order being important – P(n, n) = N!
Example: Find all permutations of A, B, C.

Permutations of some of the objects: n objects, group size r, order is important. P(n, r) = N!/(n-r)! Example: Find all 2-letter combinations using letters A, B, C.

Distinguished permutations
if a word has n letters, k of which are unique, let n (n1, n2, n3....nk) be the frequency of each of the k letters. N!/(n1!)(n2!)(n3!)
Combinations
arrangement of objects without repetition, where order is NOT important. A combination of n objects, arranged in groups of size r, without repetition, and order being important. C(n, r) = N!/r!(n-r)!

Another way to write a combination of n things, r at a time, is using the Binomial notation (Binomial Distribution), sometimes described as "n choose r".

### Counting Rules

Rule 1
If any one of K mutually exclusive and exhaustive events can occur on each of N trials, there are KN different sequences that may result from a set of such trials
Example: Flip a coin three times, finding the number of possible sequences. N=3, K=2, therefore, KN =(2)(3)=6
Rule 2
If K1, K2, ....KN are the numbers of distinct events that can occur on trials 1,....N in a series, the number of different sequences of N events that can occur is (K1)(K2)...(KN)
Example: Flip a coin and roll a die, finding the number of possible sequences. Therefore, (K1)(K2) = (2)(6) = 12
Rule 3
The number of different ways that N distinct things may be arranged in order is N! = (1)(2)(3)....(N-1)(N), where 0! = 1. An arrangement in order is called a permutation, so that the total number of permutations of N objects is N! (the symbol N! Is called N-factorial)
Example: Arrange 10 items in order, finding the number of possible ways. Therefore, 10! = 10x9x8x7x6x5x4x3x2x1 = 3628800
Rule 4
The number of ways of selecting and arranging r objects from among N distinct obejects is: N!/(N-r)! [nPr]
Example: pick 3 things from 10 items, and arrange them in order. Therefore N=10, r=3, so 10!/(10-3)! = 10!/7! = 720
Rule 5
The total number of ways of selecting r distinct combinations of N objects, irrespective of order (ie order NOT important), is: N!/r!(N-r)! [nCr]
Example: Pick 3 items from 10 in any order, where N=10, r=3. Therefore, 10!/3!(7!) = 720/6 = 120

## Consequences

We can now give some basic theorems using our axiomatic probability space.

### Theorem 1

Given a probability space (Ω,S,P), for events $A,B\in S$:

$P(A\cup B)=P(A)+P(B)-P(A\cap B)$

## Axioms of Probability

A probability function has to satisfy the following three basic axioms or constraints: To each event a a measure (number) P(a) which is called the probability of event a is assigned. P(a) is subjected to the following three axioms:

1. P(a) ≥ 0
2. P(S) = 1
3. If a∩b = 0 , then P(a∪b) = P(a)+ P(b)
Corollaries
• P(0) = 0
• P(a) = 1－ P(a)≤ 1
• If a ∩b ≠ 0 , then P(a ∪ b) = P(a)+ P(b) － P(a ∩ b)
• If b ⊂ a , P(a) = P(b)+ P(a ∩ b)≥ P(b)