Fundamental Concepts and Applications of Probability and Statistics

Fundamentals of Probability Theory

Probability theory is centered on the study of random variables and their potential outcomes. The sample space, denoted as $S$ , represents the comprehensive set of all possible outcomes of a random variable. An outcome specifically refers to the result obtained from a single trial or observation. An event, designated as $E$ , is defined as a collection consisting of one or more specific outcomes from the sample space. The mathematical definition for the probability of an event happening is expressed as $P(E) = \frac{\text{Successful outcomes}}{\text{Total possible outcomes}}$ .

Properties of Probability and Complementary Events

Probability values are governed by several fundamental mathematical properties. The probability of the sample space, $P(S)$ , is equal to $1$ , signifying a sure event. Conversely, the probability of a null or empty set, $P(\emptyset)$ , is equal to $0$ , signifying an impossible event. All probability values must fall within the range of $0 \le P(A) \le 1$ , meaning probability is always non-negative.

Complementary events are defined by the occurrence of an event $A$ versus the non-occurrence of that event, denoted as $A^c$ or "not A". The relationship between the number of outcomes in event $A$ and its complement relative to the sample space is given by $n(A) + n(A^c) = n(S)$ . When normalized by the total number of outcomes in the sample space, this leads to the formula $\frac{n(A)}{n(S)} + \frac{n(A^c)}{n(S)} = 1$ , which represents the principle that the probability of success plus the probability of failure equals one: $P(A) + P(A^c) = 1$ . Consequently, the probability of an event can be calculated as $P(A) = 1 - P(A^c)$ .

The Addition Rules of Probability

The addition rule is applied to determine the probability that at least one of two or more events will occur in a single trial. These rules are categorized based on whether the events are mutually exclusive or non-mutually exclusive. Mutually exclusive events are two or more events that can never happen simultaneously in the same trial. For these events, the probability of $E$ or $F$ occurring is the sum of their individual probabilities: $P(E \text{ or } F) = P(E) + P(F)$ .

Non-mutually exclusive events are two or more events that can happen at the same time in the same trial. To calculate the probability of $E$ or $F$ occurring in this scenario, one must subtract the probability of both events occurring simultaneously to avoid double-counting. The formula is: $P(E \text{ or } F) = P(E) + P(F) - P(E \text{ and } F)$ .

Probability Problem Solving: Discrete Objects

Consider a box containing $8$ red balls, $10$ white balls, and $12$ blue balls. If one ball is drawn at random, the probability that it is not blue can be calculated in two ways. First, by calculating the success outcomes (red and white) over the total: $P(\text{not blue}) = \frac{8 + 10}{30} = \frac{18}{30} = 0.6$ . Alternatively, using the complement rule: $P(\text{not blue}) = 1 - P(\text{blue}) = 1 - \frac{12}{30} = \frac{18}{30} = 0.6$ .

Experiments involving sequential draws, such as drawing from a bag containing $3$ white balls and $5$ red balls, highlight the difference between sampling with and without replacement. If two balls are drawn without replacement and we seek the probability that both are red, the probability of the first being red is $\frac{5}{8}$ and the second is $\frac{4}{7}$ . The joint probability is $\frac{5}{8} \times \frac{4}{7} = \frac{20}{56} \approx 0.357$ . If sampling with replacement, the probability for each draw remains $\frac{5}{8}$ , resulting in a joint probability of $\frac{5}{8} \times \frac{5}{8} = \frac{25}{64} \approx 0.391$ .

When seeking the probability of drawing one red and one white ball (in any order) without replacement from the same bag, we calculate the sum of the sequences: $(R_1 \text{ then } W_2)$ and $(W_1 \text{ then } R_2)$ . This is represented as $(\frac{5}{8} \times \frac{3}{7}) + (\frac{3}{8} \times \frac{5}{7}) = \frac{15}{56} + \frac{15}{56} = \frac{30}{56} \approx 0.536$ . With replacement, the calculation becomes $(\frac{5}{8} \times \frac{3}{8}) + (\frac{3}{8} \times \frac{5}{8}) = \frac{15}{64} + \frac{15}{64} = \frac{30}{64} \approx 0.469$ .

Probability in Card Games, Committees, and Dice

In a standard deck of $52$ playing cards (including $13$ of each suit: clubs, diamonds, hearts, and spades), finding the probability of drawing a number card (defined here as Ace to $10$ , totaling $40$ cards) or a club ( $13$ cards) requires the non-mutually exclusive addition rule because some clubs are also number cards. There are $10$ number cards that are clubs (Ace through $10$ of clubs). Thus, $P(\text{number or club}) = \frac{40}{52} + \frac{13}{52} - \frac{10}{52} = \frac{43}{52} \approx 0.827$ .

When conducting an experiment across three separate $52$ -card decks, if we want a diamond from the first, an ace from the second, and the ace of hearts from the third, the probability is the product of individual events: $P = \frac{13}{52} \times \frac{4}{52} \times \frac{1}{52} = \frac{1}{4} \times \frac{1}{13} \times \frac{1}{52} = \frac{1}{2704} \approx 0.00037$ .

Combinatorial analysis is often required for selection problems. For a committee of $3$ formed from $5$ men and $4$ women, the probability that exactly two are men is calculated using combinations: $P = \frac{\binom{5}{2} \times \binom{4}{1}}{\binom{9}{3}} = \frac{10 \times 4}{84} = \frac{40}{84} \approx 0.47619$ . Similarly, in a class of $12$ boys and $4$ girls, the probability of selecting $3$ boys at random can be found via consecutive probabilities: $\frac{12}{16} \times \frac{11}{15} \times \frac{10}{14} = \frac{11}{28} \approx 0.39286$ , or via combinations: $P = \frac{\binom{12}{3}}{\binom{16}{3}} = \frac{220}{560} = \frac{11}{28} \approx 0.39286$ .

In independent events, such as Juan solving $90\%$ ( $0.9$ ) and Pedro solving $70\%$ ( $0.7$ ) of tasks, the probability at least one solves a problem is $P(J) + P(P) - P(J \text{ and } P) = 0.9 + 0.7 - (0.9 \times 0.7) = 0.97$ . For contradiction cases, such as A telling the truth in $75\%$ of cases and B in $80\%$ , the probability they contradict each other is the sum of the probabilities that one tells the truth while the other lies: $(0.75 \times 0.20) + (0.25 \times 0.80) = 0.15 + 0.20 = 0.35$ .

For dice-rolling, a pair of dice has $36$ possible sum outcomes ranging from $2$ to $12$ . The probability of obtaining a sum of $5$ (which can occur in $4$ ways: $1-4, 4-1, 2-3, 3-2$ ) or a sum of $10$ (which can occur in $3$ ways: $4-6, 6-4, 5-5$ ) is $P = \frac{4}{36} + \frac{3}{36} = \frac{7}{36} \approx 0.19444$ .

In a set of $50$ tickets numbered $1$ to $50$ , the probability of drawing a ticket that is odd (numbered $1, 3, \dots, 49$ for $25$ total) or greater than $25$ (numbered $26, 27, \dots, 50$ for $25$ total) involves overlapping outcomes. There are $12$ odd numbers greater than $25$ ( $27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49$ —note: calculation error in transcript page 15 implies $P = 0.75$ but shows a fraction suggesting a different setup). Following the transcript's logic: $P = \frac{25}{50} + \frac{25}{50} - (25 \times \text{ unclear fraction}) = 0.75$ .

Repeated Probability (Binomial Distribution)

Repeated probability, specifically the Binomial Distribution, describes the probability that an event will occur exactly $r$ times out of $n$ trials. The formula is expressed as $P = \binom{n}{r} S^r F^{n-r}$ , where $n$ is the number of trials, $r$ is the number of successes, $S$ is the probability of success, and $F$ is the probability of failure. The identity $S + F = 1$ must always hold.

In a scenario where a couple has $3$ children ( $n=3$ ), the probability of having at least one boy (where $S=0.5$ and $F=0.5$ ) can be calculated by summing the probabilities of having exactly $1$ , $2$ , or $3$ boys: $(\binom{3}{1} \times 0.5^1 \times 0.5^2) + (\binom{3}{2} \times 0.5^2 \times 0.5^1) + (\binom{3}{3} \times 0.5^3 \times 0.5^0) = \frac{7}{8} = 0.875$ . A more efficient method for "at least one" is using the complement rule: $P(\text{at least 1}) = 1 - F^n = 1 - 0.5^3 = 0.875$ .

A tactical example involves NATO forces in Syria using a missile with a hit probability of $S = 0.3$ ( $F = 0.7$ ). To determine the number of missiles ( $n$ ) required to reach at least an $80\%$ probability of hitting the target, we use the formula $0.8 = 1 - 0.7^n$ . Solving for $n$ via logarithms: $0.7^n = 0.2$ . This results in $n = 4.512$ , which implies that approximately $5$ shots are necessary.

Statistics: Measures of Central Tendency and Dispersion

In statistics, the mean refers to the average, calculated as the sum of all scores divided by the number of values in the set ( $\bar{x} = \frac{\sum x}{n}$ ). The median is the middle number in a set when the values are arranged in ascending or descending order. For an odd number of data points, the location is $\frac{n+1}{2}$ . For an even number of data points, the median is the average of the two central numbers.

The mode is the value that occurs most frequently in a given set. A set can be unimodal (one mode), bimodal (two modes), or multimodal (more than two modes). The range measures dispersion and is defined as the difference between the largest value (Highest Value, $HV$ ) and the smallest value (Lowest Value, $LV$ ) in the set: $R = HV - LV$ .

Statistical Case Study Analysis

Consider a dataset based on $100$ random samples with the following characteristics:

The sum of all $100$ measurements is $81.170$ .
$45$ measurements occur between $0.759$ and $0.800$ .
Smallest value ( $1^{\text{st}}$ measurement) is $0.759$ .
Largest value ( $100^{\text{th}}$ measurement) is $0.858$ .
Specific observations include: $0.801$ (observed three times), $0.802$ (once), $0.803$ (twice), and $0.804$ (four times).
High-end distribution: $45$ measurements are between $0.805$ and $0.858$ .

For this dataset, the mean is calculated as $\frac{81.17}{100} = 0.8117$ . To find the median for $n=100$ , the location involves the $50^{\text{th}}$ and $51^{\text{st}}$ terms. Counting from the bottom, the first $45$ terms are $\le 0.800$ . The $46^{\text{th}}$ , $47^{\text{th}}$ , and $48^{\text{th}}$ terms are $0.801$ . The $49^{\text{th}}$ term is $0.802$ . The $50^{\text{th}}$ and $51^{\text{st}}$ terms are both $0.803$ . Thus, the median is $0.803$ . The mode is the most frequent value, which is $0.804$ as it was observed four times. The range is computed as $0.858 - 0.759 = 0.099$ .