Chapter 2: Foundations of Probability Theory

Probability is the mathematical language we use to describe uncertainty and chance. Whether you're predicting the outcome of an experiment, analyzing the likelihood of an event, or making decisions based on incomplete information, understanding probability is essential. This chapter builds the theoretical foundations you'll need to work with random processes and uncertain outcomes.

2.1 Sample Space

What Is a Sample Space?

When we perform an experiment (a process that generates data), we need a way to describe all possible outcomes. This is where the concept of a sample space becomes crucial.

Sample Space: The set of all possible outcomes of a statistical experiment, denoted by the symbol $S$ .

Each individual outcome in the sample space is called an element or a sample point. If the sample space is finite, we can list its elements separated by commas and enclosed in braces.

For example, if we toss a coin, the sample space is $S = \{H, T\}$ , where $H$ represents heads and $T$ represents tails. If we're interested in rolling a die and recording the face-up value, the sample space is $S_1 = \{1, 2, 3, 4, 5, 6\}$ .

Describing Sample Spaces

The way you describe a sample space depends on what outcome you're interested in measuring. Consider tossing a coin twice. If you care about the specific sequence of heads and tails, one sample space is $S = \{HH, HT, TH, TT\}$ (all 4 possible outcomes). But if you only care about the total number of heads, you might use $S = \{0, 1, 2\}$ (just 3 outcomes). The choice matters because it determines what information your sample space captures.

For experiments with many or infinite outcomes, we describe the sample space using a rule method. For instance, if we're interested in all cities with populations over 1 million, we might write:

$S = \{x \mid x \text{ is a city with a population over 1 million}\}$

This reads as "the set of all $x$ such that $x$ is a city with a population over 1 million."

Visualizing Sample Spaces

Tree diagrams are helpful for visualizing complex sample spaces. When an experiment consists of multiple stages (like flipping a coin, then rolling a die), a tree diagram shows all possible paths through these stages. Each path represents one outcome; the total number of endpoints equals the total number of sample points.

📝 Section Recap: A sample space is the complete set of all possible outcomes of an experiment. You can describe it by listing elements, using a rule, or visualizing it with a tree diagram. The same experiment can have different sample spaces depending on what outcome you're measuring.

2.2 Events

Defining Events

Once you have a sample space, you'll often be interested in specific subsets of outcomes. This is where events come in.

Event: A subset of the sample space. An event consists of all outcomes in which the event is true.

For example, if you roll a die, the event "rolling an even number" includes the outcomes $\{2, 4, 6\}$ . The event "rolling a number greater than 3" includes $\{4, 5, 6\}$ .

Two special cases are worth noting: the null event (denoted $\emptyset$ ) contains no outcomes and never occurs. The entire sample space $S$ always occurs.

Combining Events

We often want to describe new events by combining existing ones. Three basic operations are fundamental:

Complement of an Event: The complement of event $A$ , denoted $A'$ , is the set of all sample points in $S$ that are not in $A$ . If $A$ occurs, then $A'$ does not occur, and vice versa.

Intersection of Events: The intersection of two events $A$ and $B$ , denoted $A \cap B$ , is the event containing all sample points that belong to both $A$ and $B$ . For the intersection to occur, both $A$ and $B$ must occur.

Union of Events: The union of two events $A$ and $B$ , denoted $A \cup B$ , is the event containing all sample points that belong to $A$ or $B$ or both. The union occurs if at least one of the events occurs.

Mutually Exclusive and Disjoint Events

Sometimes two events cannot possibly occur together.

Mutually Exclusive (Disjoint) Events: Two events $A$ and $B$ are mutually exclusive or disjoint if $A \cap B = \emptyset$ —that is, they have no outcomes in common.

For instance, if you draw one card from a standard deck, the event "the card is a heart" and the event "the card is a spade" are mutually exclusive. Both cannot happen on a single draw.

Visualizing Event Relationships

Venn diagrams provide a clear visual representation of events and their relationships. The sample space is shown as a rectangle, and events are represented as circles (or other regions) within it. The areas of intersection, union, and complement can be easily identified and their relative sizes understood.

📝 Section Recap: Events are subsets of the sample space. We can combine events using complement (what's not in the event), intersection (what's in both), and union (what's in either). Mutually exclusive events cannot occur simultaneously. Venn diagrams help visualize these relationships clearly.

2.3 Counting Sample Points

The Multiplication Rule

Counting the number of outcomes in a sample space can be tedious if we list them all. Fortunately, there's a systematic way to count without listing every element.

Rule 2.1 (Multiplication Rule): If an operation can be performed in $n_1$ ways, and if for each of these a second operation can be performed in $n_2$ ways, then the two operations together can be performed in $n_1 n_2$ ways.

This extends to multiple operations: if you have $k$ sequential operations that can be performed in $n_1, n_2, \ldots, n_k$ ways respectively, the total number of ways to perform all $k$ operations is $n_1 n_2 \cdots n_k$ .

Permutations

When we arrange objects where order matters, we're creating permutations.

Permutation: An arrangement of all or part of a set of objects.

The number of ways to arrange $n$ distinct objects is $n! = n(n-1)(n-2) \cdots (2)(1)$ , where $n!$ is read as " $n$ factorial." By definition, $0! = 1$ .

When we select $r$ objects from $n$ distinct objects and arrange them in order, we use the formula:

$_n P_r = \frac{n!}{(n-r)!}$

For example, if you have 5 people and want to select 2 to stand in line (where order matters), there are $5 P_2 = \frac{5!}{3!} = 20$ ways.

Circular permutations occur when objects are arranged in a circle. Since rotation doesn't create a new arrangement, there are $(n-1)!$ distinct circular permutations of $n$ objects.

Combinations

Sometimes the order doesn't matter—we just care about which objects are selected.

Combination: A selection of objects where order does not matter.

The number of ways to select $r$ objects from $n$ distinct objects is:

$\binom{n}{r} = \frac{n!}{r!(n-r)!}$

Note that $\binom{n}{r} = \binom{n}{n-r}$ because selecting $r$ objects is the same as leaving behind $n-r$ objects.

Distinguishing Permutations and Combinations

The key difference is whether order matters. Selecting a president and a treasurer from 10 people is a permutation problem (order matters—these are different roles). Selecting 3 people to form a committee is a combination problem (order doesn't matter—they're all equal members).

📝 Section Recap: The multiplication rule counts sequential outcomes systematically. Permutations count arrangements where order matters: $n P_r = \frac{n!}{(n-r)!}$ . Combinations count selections where order doesn't matter: $\binom{n}{r} = \frac{n!}{r!(n-r)!}$ . Choose the right method based on whether order is important.

2.4 Probability of an Event

What Is Probability?

Probability is a numerical measure of the likelihood that an event will occur. We assign a probability to each sample point such that probabilities are non-negative and sum to 1.

Definition 2.9: The probability of an event $A$ is the sum of the weights (probabilities) of all sample points in $A$ . If $S$ is the sample space:

$0 \leq P(A) \leq 1, \quad P(\emptyset) = 0, \quad P(S) = 1$

Furthermore, if $A_1, A_2, A_3, \ldots$ is a sequence of mutually exclusive events, then $P(A_1 \cup A_2 \cup A_3 \cup \cdots) = P(A_1) + P(A_2) + P(A_3) + \cdots$

Equally Likely Outcomes

In many experiments, all outcomes are equally likely. When this is true:

Rule 2.3: If an experiment can result in any one of $N$ different equally likely outcomes, and if exactly $n$ of these outcomes correspond to event $A$ , then the probability of event $A$ is $P(A) = \frac{n}{N}$

This is the classical approach to probability. It works well for controlled experiments like rolling dice or drawing cards, where symmetry ensures equal likelihood.

Alternative Approaches to Probability

Not all experiments have equally likely outcomes. The relative frequency definition (or limiting relative frequency) views probability as the long-run proportion of times an event occurs if an experiment is repeated many times. If we perform an experiment and an event occurs in $n$ out of $N$ trials, we estimate $P(A) \approx n/N$ , and this estimate improves as $N$ increases.

The subjective definition of probability represents personal belief or opinion about the likelihood of an event. This approach is useful when experiments cannot be repeated or when prior information influences judgment. Though more subjective, it's valuable in Bayesian statistics (discussed in Chapter 18).

📝 Section Recap: Probability measures the likelihood of an event on a scale from 0 to 1. For equally likely outcomes, use $P(A) = n/N$ . Relative frequency interprets probability as the long-run proportion of occurrences. Subjective probability incorporates personal judgment and prior knowledge. All three approaches are valid in different contexts.

2.5 Additive Rules

Calculating Probabilities Using Unions

Often we need to find the probability that one event or another occurs. The additive rule helps us do this.

Theorem 2.7: If $A$ and $B$ are two events, then $P(A \cup B) = P(A) + P(B) - P(A \cap B)$

The reason we subtract $P(A \cap B)$ is important: when we add $P(A)$ and $P(B)$ , we count the overlapping region (where both events occur) twice. Subtracting it once corrects this double-counting.

Corollary 2.1: If $A$ and $B$ are mutually exclusive, then $P(A \cup B) = P(A) + P(B)$

When two events cannot occur together, there's no overlap to subtract.

Extensions to Multiple Events

For three or more mutually exclusive events:

Corollary 2.2: If $A_1, A_2, \ldots, A_n$ are mutually exclusive, then $P(A_1 \cup A_2 \cup \cdots \cup A_n) = P(A_1) + P(A_2) + \cdots + P(A_n)$

Corollary 2.3 tells us that if a collection of mutually exclusive events $\{A_1, A_2, \ldots, A_n\}$ partitions the sample space $S$ (meaning every outcome falls into exactly one event), then the sum of their probabilities equals 1:

$P(A_1 \cup A_2 \cup \cdots \cup A_n) = P(A_1) + P(A_2) + \cdots + P(A_n) = 1$

Using Complementary Events

When it's easier to find the probability that something does not occur, use complementary events.

Theorem 2.9: If $A$ and $A'$ are complementary events, then $P(A) + P(A') = 1$

Therefore, $P(A) = 1 - P(A')$ .

This is particularly useful when calculating probabilities involving "at least one" scenarios. It's often easier to calculate "none" and subtract from 1.

📝 Section Recap: Use the additive rule $P(A \cup B) = P(A) + P(B) - P(A \cap B)$ to find the probability of unions. For mutually exclusive events, simply add probabilities. Partition rules tell us that probabilities within a partition sum to 1. Complementary events help us tackle "at least one" problems by calculating the opposite and subtracting from 1.

2.6 Conditional Probability, Independence, and the Product Rule

Understanding Conditional Probability

Sometimes we have additional information that affects the probability of an event. This is where conditional probability enters.

Definition 2.10: The conditional probability of event $B$ given that event $A$ has occurred, denoted $P(B|A)$ , is defined by $P(B|A) = \frac{P(A \cap B)}{P(A)}, \quad \text{provided } P(A) > 0$

Think of conditional probability as a way to update our understanding of the likelihood of $B$ in light of knowing that $A$ has happened. We're working with a reduced sample space—only the outcomes where $A$ is true.

Independent Events

Two events are independent if knowing that one occurred doesn't change the probability of the other.

Definition 2.11: Two events $A$ and $B$ are independent if and only if $P(B|A) = P(B) \quad \text{or} \quad P(A|B) = P(A)$

Otherwise, $A$ and $B$ are dependent.

Independence is a powerful property because it simplifies probability calculations significantly. If $A$ and $B$ are independent, $P(A \cap B) = P(A)P(B)$ .

The Product Rule (Multiplicative Rule)

The product rule allows us to calculate the probability that multiple events all occur.

Theorem 2.10: If $A$ and $B$ are two events that can both occur, then $P(A \cap B) = P(A)P(B|A), \quad \text{provided } P(A) > 0$

This is equivalent to rearranging the conditional probability formula. It tells us: the probability that both events occur equals the probability of the first times the conditional probability of the second given the first.

Theorem 2.11: Two events $A$ and $B$ are independent if and only if $P(A \cap B) = P(A)P(B)$

For independent events, the calculation is straightforward—just multiply the individual probabilities.

Extending to Multiple Events

Theorem 2.12: For multiple events that can occur, $P(A_1 \cap A_2 \cap \cdots \cap A_k) = P(A_1)P(A_2|A_1)P(A_3|A_1 \cap A_2) \cdots P(A_k|A_1 \cap A_2 \cap \cdots \cap A_{k-1})$

If the events are independent, then $P(A_1 \cap A_2 \cap \cdots \cap A_k) = P(A_1)P(A_2) \cdots P(A_k)$

📝 Section Recap: Conditional probability $P(B|A)$ updates the probability of $B$ given that $A$ has occurred. Independent events have the property that $P(B|A) = P(B)$ . The product rule states $P(A \cap B) = P(A)P(B|A)$ . For independent events, this simplifies to $P(A \cap B) = P(A)P(B)$ . Always check whether events are independent before applying simplifications.

2.7 Bayes' Rule

Total Probability

When a sample space can be partitioned into mutually exclusive events, we can express the probability of any event as a weighted sum.

Theorem 2.13 (Total Probability): If the events $B_1, B_2, \ldots, B_k$ constitute a partition of the sample space $S$ with $P(B_i) \neq 0$ for $i = 1, 2, \ldots, k$ , then for any event $A$ of $S$ , $P(A) = \sum_{i=1}^{k} P(B_i \cap A) = \sum_{i=1}^{k} P(B_i)P(A|B_i)$

This theorem is useful when calculating the probability of an event that can occur through several different paths or causes. You find the probability through each path and add them.

Bayes' Rule

Bayes' rule allows us to reverse the direction of conditional probability. If we know $P(A|B)$ , we can calculate $P(B|A)$ .

Bayes' Rule: If $B_1, B_2, \ldots, B_k$ constitute a partition of the sample space $S$ , and $A$ is any event, then $P(B_j|A) = \frac{P(B_j \cap A)}{P(A)} = \frac{P(B_j)P(A|B_j)}{\sum_{i=1}^{k} P(B_i)P(A|B_i)}$

The numerator is the probability of the specific path (partition $B_j$ and event $A$ ). The denominator is the total probability of event $A$ through all paths. This rule is the foundation of Bayesian inference, where we update our beliefs about the causes (the $B_i$ events) based on observing an effect (event $A$ ).

📝 Section Recap: The total probability theorem partitions the sample space into mutually exclusive events and expresses $P(A)$ as a sum over all paths. Bayes' rule calculates the posterior probability $P(B_j|A)$ by combining the prior probabilities $P(B_j)$ with the likelihoods $P(A|B_j)$ . This framework is essential for updating probabilities based on new evidence.

Summary of Key Concepts

You now understand the foundational building blocks of probability:

Sample spaces organize all possible outcomes of an experiment
Events are subsets of the sample space we're interested in
Counting techniques (permutations and combinations) help us enumerate outcomes without listing them all
Probability measures quantify the likelihood of events using several valid approaches
Additive and multiplicative rules allow us to calculate probabilities of complex events
Conditional probability and independence model how information and dependence affect likelihood
Bayes' rule provides a framework for updating probabilities as new information arrives

These tools form the foundation upon which all statistical inference rests. As you progress through this course, you'll see these concepts applied repeatedly to model real-world uncertainty and make data-driven decisions.