Ch 2FREE

Foundations of Probability Theory

10 min

Chapter 2: Foundations of Probability Theory

Probability is the mathematical language we use to describe uncertainty and chance. Whether you're predicting the outcome of an experiment, analyzing the likelihood of an event, or making decisions based on incomplete information, understanding probability is essential. This chapter builds the theoretical foundations you'll need to work with random processes and uncertain outcomes.


2.1 Sample Space

What Is a Sample Space?

When we perform an experiment (a process that generates data), we need a way to describe all possible outcomes. This is where the concept of a sample space becomes crucial.

Sample Space: The set of all possible outcomes of a statistical experiment, denoted by the symbol SSS.

Each individual outcome in the sample space is called an element or a sample point. If the sample space is finite, we can list its elements separated by commas and enclosed in braces.

For example, if we toss a coin, the sample space is S={H,T}S = \{H, T\}S={H,T}, where HHH represents heads and TTT represents tails. If we're interested in rolling a die and recording the face-up value, the sample space is S1={1,2,3,4,5,6}S_1 = \{1, 2, 3, 4, 5, 6\}S1={1,2,3,4,5,6}.

Describing Sample Spaces

The way you describe a sample space depends on what outcome you're interested in measuring. Consider tossing a coin twice. If you care about the specific sequence of heads and tails, one sample space is S={HH,HT,TH,TT}S = \{HH, HT, TH, TT\}S={HH,HT,TH,TT} (all 4 possible outcomes). But if you only care about the total number of heads, you might use S={0,1,2}S = \{0, 1, 2\}S={0,1,2} (just 3 outcomes). The choice matters because it determines what information your sample space captures.

For experiments with many or infinite outcomes, we describe the sample space using a rule method. For instance, if we're interested in all cities with populations over 1 million, we might write:

S={xx is a city with a population over 1 million}S = \{x \mid x \text{ is a city with a population over 1 million}\}S={xx is a city with a population over 1 million}

This reads as "the set of all xxx such that xxx is a city with a population over 1 million."

Visualizing Sample Spaces

Tree diagrams are helpful for visualizing complex sample spaces. When an experiment consists of multiple stages (like flipping a coin, then rolling a die), a tree diagram shows all possible paths through these stages. Each path represents one outcome; the total number of endpoints equals the total number of sample points.

📝 Section Recap: A sample space is the complete set of all possible outcomes of an experiment. You can describe it by listing elements, using a rule, or visualizing it with a tree diagram. The same experiment can have different sample spaces depending on what outcome you're measuring.


2.2 Events

Defining Events

Once you have a sample space, you'll often be interested in specific subsets of outcomes. This is where events come in.

Event: A subset of the sample space. An event consists of all outcomes in which the event is true.

For example, if you roll a die, the event "rolling an even number" includes the outcomes {2,4,6}\{2, 4, 6\}{2,4,6}. The event "rolling a number greater than 3" includes {4,5,6}\{4, 5, 6\}{4,5,6}.

Two special cases are worth noting: the null event (denoted \emptyset) contains no outcomes and never occurs. The entire sample space SSS always occurs.

Combining Events

We often want to describe new events by combining existing ones. Three basic operations are fundamental:

Complement of an Event: The complement of event AAA, denoted AA'A, is the set of all sample points in SSS that are not in AAA. If AAA occurs, then AA'A does not occur, and vice versa.

Intersection of Events: The intersection of two events AAA and BBB, denoted ABA \cap BAB, is the event containing all sample points that belong to both AAA and BBB. For the intersection to occur, both AAA and BBB must occur.

Union of Events: The union of two events AAA and BBB, denoted ABA \cup BAB, is the event containing all sample points that belong to AAA or BBB or both. The union occurs if at least one of the events occurs.

Mutually Exclusive and Disjoint Events

Sometimes two events cannot possibly occur together.

Mutually Exclusive (Disjoint) Events: Two events AAA and BBB are mutually exclusive or disjoint if AB=A \cap B = \emptysetAB=—that is, they have no outcomes in common.

For instance, if you draw one card from a standard deck, the event "the card is a heart" and the event "the card is a spade" are mutually exclusive. Both cannot happen on a single draw.

Visualizing Event Relationships

Venn diagrams provide a clear visual representation of events and their relationships. The sample space is shown as a rectangle, and events are represented as circles (or other regions) within it. The areas of intersection, union, and complement can be easily identified and their relative sizes understood.

📝 Section Recap: Events are subsets of the sample space. We can combine events using complement (what's not in the event), intersection (what's in both), and union (what's in either). Mutually exclusive events cannot occur simultaneously. Venn diagrams help visualize these relationships clearly.


2.3 Counting Sample Points

The Multiplication Rule

Counting the number of outcomes in a sample space can be tedious if we list them all. Fortunately, there's a systematic way to count without listing every element.

Rule 2.1 (Multiplication Rule): If an operation can be performed in n1n_1n1 ways, and if for each of these a second operation can be performed in n2n_2n2 ways, then the two operations together can be performed in n1n2n_1 n_2n1n2 ways.

This extends to multiple operations: if you have kkk sequential operations that can be performed in n1,n2,,nkn_1, n_2, \ldots, n_kn1,n2,,nk ways respectively, the total number of ways to perform all kkk operations is n1n2nkn_1 n_2 \cdots n_kn1n2nk.

Permutations

When we arrange objects where order matters, we're creating permutations.

Permutation: An arrangement of all or part of a set of objects.

The number of ways to arrange nnn distinct objects is n!=n(n1)(n2)(2)(1)n! = n(n-1)(n-2) \cdots (2)(1)n!=n(n1)(n2)(2)(1), where n!n!n! is read as "nnn factorial." By definition, 0!=10! = 10!=1.

When we select rrr objects from nnn distinct objects and arrange them in order, we use the formula:

nPr=n!(nr)!_n P_r = \frac{n!}{(n-r)!}nPr=(nr)!n!

For example, if you have 5 people and want to select 2 to stand in line (where order matters), there are 5P2=5!3!=205 P_2 = \frac{5!}{3!} = 205P2=3!5!=20 ways.

Circular permutations occur when objects are arranged in a circle. Since rotation doesn't create a new arrangement, there are (n1)!(n-1)!(n1)! distinct circular permutations of nnn objects.

Combinations

Sometimes the order doesn't matter—we just care about which objects are selected.

Combination: A selection of objects where order does not matter.

The number of ways to select rrr objects from nnn distinct objects is:

(nr)=n!r!(nr)!\binom{n}{r} = \frac{n!}{r!(n-r)!}(rn)=r!(nr)!n!

Note that (nr)=(nnr)\binom{n}{r} = \binom{n}{n-r}(rn)=(nrn) because selecting rrr objects is the same as leaving behind nrn-rnr objects.

Distinguishing Permutations and Combinations

The key difference is whether order matters. Selecting a president and a treasurer from 10 people is a permutation problem (order matters—these are different roles). Selecting 3 people to form a committee is a combination problem (order doesn't matter—they're all equal members).

📝 Section Recap: The multiplication rule counts sequential outcomes systematically. Permutations count arrangements where order matters: nPr=n!(nr)!n P_r = \frac{n!}{(n-r)!}nPr=(nr)!n!. Combinations count selections where order doesn't matter: (nr)=n!r!(nr)!\binom{n}{r} = \frac{n!}{r!(n-r)!}(rn)=r!(nr)!n!. Choose the right method based on whether order is important.


2.4 Probability of an Event

What Is Probability?

Probability is a numerical measure of the likelihood that an event will occur. We assign a probability to each sample point such that probabilities are non-negative and sum to 1.

Definition 2.9: The probability of an event AAA is the sum of the weights (probabilities) of all sample points in AAA. If SSS is the sample space:

0P(A)1,P()=0,P(S)=10 \leq P(A) \leq 1, \quad P(\emptyset) = 0, \quad P(S) = 10P(A)1,P()=0,P(S)=1

Furthermore, if A1,A2,A3,A_1, A_2, A_3, \ldotsA1,A2,A3, is a sequence of mutually exclusive events, then P(A1A2A3)=P(A1)+P(A2)+P(A3)+P(A_1 \cup A_2 \cup A_3 \cup \cdots) = P(A_1) + P(A_2) + P(A_3) + \cdotsP(A1A2A3)=P(A1)+P(A2)+P(A3)+

Equally Likely Outcomes

In many experiments, all outcomes are equally likely. When this is true:

Rule 2.3: If an experiment can result in any one of NNN different equally likely outcomes, and if exactly nnn of these outcomes correspond to event AAA, then the probability of event AAA is P(A)=nNP(A) = \frac{n}{N}P(A)=Nn

This is the classical approach to probability. It works well for controlled experiments like rolling dice or drawing cards, where symmetry ensures equal likelihood.

Alternative Approaches to Probability

Not all experiments have equally likely outcomes. The relative frequency definition (or limiting relative frequency) views probability as the long-run proportion of times an event occurs if an experiment is repeated many times. If we perform an experiment and an event occurs in nnn out of NNN trials, we estimate P(A)n/NP(A) \approx n/NP(A)n/N, and this estimate improves as NNN increases.

The subjective definition of probability represents personal belief or opinion about the likelihood of an event. This approach is useful when experiments cannot be repeated or when prior information influences judgment. Though more subjective, it's valuable in Bayesian statistics (discussed in Chapter 18).

📝 Section Recap: Probability measures the likelihood of an event on a scale from 0 to 1. For equally likely outcomes, use P(A)=n/NP(A) = n/NP(A)=n/N. Relative frequency interprets probability as the long-run proportion of occurrences. Subjective probability incorporates personal judgment and prior knowledge. All three approaches are valid in different contexts.


2.5 Additive Rules

Calculating Probabilities Using Unions

Often we need to find the probability that one event or another occurs. The additive rule helps us do this.

Theorem 2.7: If AAA and BBB are two events, then P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)P(AB)=P(A)+P(B)P(AB)

The reason we subtract P(AB)P(A \cap B)P(AB) is important: when we add P(A)P(A)P(A) and P(B)P(B)P(B), we count the overlapping region (where both events occur) twice. Subtracting it once corrects this double-counting.

Corollary 2.1: If AAA and BBB are mutually exclusive, then P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)P(AB)=P(A)+P(B)

When two events cannot occur together, there's no overlap to subtract.

Extensions to Multiple Events

For three or more mutually exclusive events:

Corollary 2.2: If A1,A2,,AnA_1, A_2, \ldots, A_nA1,A2,,An are mutually exclusive, then P(A1A2An)=P(A1)+P(A2)++P(An)P(A_1 \cup A_2 \cup \cdots \cup A_n) = P(A_1) + P(A_2) + \cdots + P(A_n)P(A1A2An)=P(A1)+P(A2)++P(An)

Corollary 2.3 tells us that if a collection of mutually exclusive events {A1,A2,,An}\{A_1, A_2, \ldots, A_n\}{A1,A2,,An} partitions the sample space SSS (meaning every outcome falls into exactly one event), then the sum of their probabilities equals 1:

P(A1A2An)=P(A1)+P(A2)++P(An)=1P(A_1 \cup A_2 \cup \cdots \cup A_n) = P(A_1) + P(A_2) + \cdots + P(A_n) = 1P(A1A2An)=P(A1)+P(A2)++P(An)=1

Using Complementary Events

When it's easier to find the probability that something does not occur, use complementary events.

Theorem 2.9: If AAA and AA'A are complementary events, then P(A)+P(A)=1P(A) + P(A') = 1P(A)+P(A)=1

Therefore, P(A)=1P(A)P(A) = 1 - P(A')P(A)=1P(A).

This is particularly useful when calculating probabilities involving "at least one" scenarios. It's often easier to calculate "none" and subtract from 1.

📝 Section Recap: Use the additive rule P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)P(AB)=P(A)+P(B)P(AB) to find the probability of unions. For mutually exclusive events, simply add probabilities. Partition rules tell us that probabilities within a partition sum to 1. Complementary events help us tackle "at least one" problems by calculating the opposite and subtracting from 1.


2.6 Conditional Probability, Independence, and the Product Rule

Understanding Conditional Probability

Sometimes we have additional information that affects the probability of an event. This is where conditional probability enters.

Definition 2.10: The conditional probability of event BBB given that event AAA has occurred, denoted P(BA)P(B|A)P(BA), is defined by P(BA)=P(AB)P(A),provided P(A)>0P(B|A) = \frac{P(A \cap B)}{P(A)}, \quad \text{provided } P(A) > 0P(BA)=P(A)P(AB),provided P(A)>0

Think of conditional probability as a way to update our understanding of the likelihood of BBB in light of knowing that AAA has happened. We're working with a reduced sample space—only the outcomes where AAA is true.

Independent Events

Two events are independent if knowing that one occurred doesn't change the probability of the other.

Definition 2.11: Two events AAA and BBB are independent if and only if P(BA)=P(B)orP(AB)=P(A)P(B|A) = P(B) \quad \text{or} \quad P(A|B) = P(A)P(BA)=P(B)orP(AB)=P(A)

Otherwise, AAA and BBB are dependent.

Independence is a powerful property because it simplifies probability calculations significantly. If AAA and BBB are independent, P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)P(AB)=P(A)P(B).

The Product Rule (Multiplicative Rule)

The product rule allows us to calculate the probability that multiple events all occur.

Theorem 2.10: If AAA and BBB are two events that can both occur, then P(AB)=P(A)P(BA),provided P(A)>0P(A \cap B) = P(A)P(B|A), \quad \text{provided } P(A) > 0P(AB)=P(A)P(BA),provided P(A)>0

This is equivalent to rearranging the conditional probability formula. It tells us: the probability that both events occur equals the probability of the first times the conditional probability of the second given the first.

Theorem 2.11: Two events AAA and BBB are independent if and only if P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)P(AB)=P(A)P(B)

For independent events, the calculation is straightforward—just multiply the individual probabilities.

Extending to Multiple Events

Theorem 2.12: For multiple events that can occur, P(A1A2Ak)=P(A1)P(A2A1)P(A3A1A2)P(AkA1A2Ak1)P(A_1 \cap A_2 \cap \cdots \cap A_k) = P(A_1)P(A_2|A_1)P(A_3|A_1 \cap A_2) \cdots P(A_k|A_1 \cap A_2 \cap \cdots \cap A_{k-1})P(A1A2Ak)=P(A1)P(A2A1)P(A3A1A2)P(AkA1A2Ak1)

If the events are independent, then P(A1A2Ak)=P(A1)P(A2)P(Ak)P(A_1 \cap A_2 \cap \cdots \cap A_k) = P(A_1)P(A_2) \cdots P(A_k)P(A1A2Ak)=P(A1)P(A2)P(Ak)

📝 Section Recap: Conditional probability P(BA)P(B|A)P(BA) updates the probability of BBB given that AAA has occurred. Independent events have the property that P(BA)=P(B)P(B|A) = P(B)P(BA)=P(B). The product rule states P(AB)=P(A)P(BA)P(A \cap B) = P(A)P(B|A)P(AB)=P(A)P(BA). For independent events, this simplifies to P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)P(AB)=P(A)P(B). Always check whether events are independent before applying simplifications.


2.7 Bayes' Rule

Total Probability

When a sample space can be partitioned into mutually exclusive events, we can express the probability of any event as a weighted sum.

Theorem 2.13 (Total Probability): If the events B1,B2,,BkB_1, B_2, \ldots, B_kB1,B2,,Bk constitute a partition of the sample space SSS with P(Bi)0P(B_i) \neq 0P(Bi)=0 for i=1,2,,ki = 1, 2, \ldots, ki=1,2,,k, then for any event AAA of SSS, P(A)=i=1kP(BiA)=i=1kP(Bi)P(ABi)P(A) = \sum_{i=1}^{k} P(B_i \cap A) = \sum_{i=1}^{k} P(B_i)P(A|B_i)P(A)=i=1kP(BiA)=i=1kP(Bi)P(ABi)

This theorem is useful when calculating the probability of an event that can occur through several different paths or causes. You find the probability through each path and add them.

Bayes' Rule

Bayes' rule allows us to reverse the direction of conditional probability. If we know P(AB)P(A|B)P(AB), we can calculate P(BA)P(B|A)P(BA).

Bayes' Rule: If B1,B2,,BkB_1, B_2, \ldots, B_kB1,B2,,Bk constitute a partition of the sample space SSS, and AAA is any event, then P(BjA)=P(BjA)P(A)=P(Bj)P(ABj)i=1kP(Bi)P(ABi)P(B_j|A) = \frac{P(B_j \cap A)}{P(A)} = \frac{P(B_j)P(A|B_j)}{\sum_{i=1}^{k} P(B_i)P(A|B_i)}P(BjA)=P(A)P(BjA)=i=1kP(Bi)P(ABi)P(Bj)P(ABj)

The numerator is the probability of the specific path (partition BjB_jBj and event AAA). The denominator is the total probability of event AAA through all paths. This rule is the foundation of Bayesian inference, where we update our beliefs about the causes (the BiB_iBi events) based on observing an effect (event AAA).

📝 Section Recap: The total probability theorem partitions the sample space into mutually exclusive events and expresses P(A)P(A)P(A) as a sum over all paths. Bayes' rule calculates the posterior probability P(BjA)P(B_j|A)P(BjA) by combining the prior probabilities P(Bj)P(B_j)P(Bj) with the likelihoods P(ABj)P(A|B_j)P(ABj). This framework is essential for updating probabilities based on new evidence.


Summary of Key Concepts

You now understand the foundational building blocks of probability:

  1. Sample spaces organize all possible outcomes of an experiment
  2. Events are subsets of the sample space we're interested in
  3. Counting techniques (permutations and combinations) help us enumerate outcomes without listing them all
  4. Probability measures quantify the likelihood of events using several valid approaches
  5. Additive and multiplicative rules allow us to calculate probabilities of complex events
  6. Conditional probability and independence model how information and dependence affect likelihood
  7. Bayes' rule provides a framework for updating probabilities as new information arrives

These tools form the foundation upon which all statistical inference rests. As you progress through this course, you'll see these concepts applied repeatedly to model real-world uncertainty and make data-driven decisions.