Statistics Chapter 2: Frequency Distributions and Table Construction of Tables

Categorization of Variables: Discrete vs. Continuous

Discrete Variables * Definition: A discrete variable consists of separate, indivisible categories. No intermediate values can exist between two adjacent or neighboring categories. * Example — Siblings: A person may have $2$ siblings or $3$ siblings. It is impossible to have $2.1$ or $2.4$ siblings; the change occurs in whole, separate steps. * Rule of Thumb: Discrete variables are typically things you count (e.g., "how many"). This includes the number of classes taken at a college ( $JP$ ). * Nominal Variables as Discrete: Even though nominal variables are non-numeric labels, they are still discrete. * Example — Occupation: On a survey with $40$ options (including "Other"), a respondent must choose one box. They cannot provide a value halfway between 'Plumber' and 'Electrician'. * Example — College Major: A student is typically classified as a certain major (Psychology, Business, Sociology). Even if they take classes in multiple fields, the administrative system requires them to pick one distinct category. * Example — Gender: Even in surveys with options such as "Non-binary," "Prefer not to say," or "Other," these remain discrete categories because the respondent must select a specific box.
Continuous Variables * Definition: A variable that has an infinite number of possible values between any two observed points. It is divisible into an infinite number of fractional parts. * Rule of Thumb: Continuous variables are typically things you measure (e.g., "how much"). * Example — Height: A person does not jump from $50$ to $51$ inches instantly. They grow through every possible fraction of an inch in between. * Example — Weight: Weight gain occurs slowly and steadily. One does not simply weigh $170$ pounds and then suddenly weigh $171$ ; you pass through an infinite number of weights to reach the next pound. * Example — Distance: Driving a mile involves passing through an infinite number of infinitesimal distances to reach the next mile marker.

The Principle of Infinity in Continuous Variables

Theoretical Impossibility of Exact Matches * Technically, it is impossible for two individuals to have the exact same score on a truly continuous variable if measured with infinite precision. * Example — Height Tie-Breaking: If two friends are both roughly $70$ inches tall, a more precise ruler might show one is $70.15$ and the other is $70.158$ . Because there is an infinite number of decimal places, you can always measure further (to the millionths, trillionths, etc.) to break a tie. * Practical Implication: Because we cannot measure to infinity, we must round to the nearest whole number or decimal (e.g., the nearest whole inch).

Real Limits and Rounding for Intervals

Real Limits * Real limits are a way of defining rounding boundaries for continuous variables when they are grouped into intervals. * Apparent Limits: The values visible on a table (e.g., $64$ to $65$ ). * Lower Real Limit: The lowest value that rounds up to the category. It is found by subtracting $0.5$ from the lower apparent limit. * Upper Real Limit: The highest value that rounds down into the category. It is found by adding $0.5$ to the upper apparent limit (or more accurately, $0.4999...$ repeating). * Example: For an interval of $64$ to $65$ inches: * The lower real limit is $63.5$ inches. * The upper real limit is $65.5$ inches (specifically as close to $65.5$ as possible without hitting it). * Constraint: The same value cannot be in two categories. For instance, $65.5$ cannot be both the upper limit of one group and the lower limit of the next; one must round up while the other rounds down.

Frequency Distributions: Definitions and Purpose

Definition * A frequency distribution is an organized tabling or graphing of the number of individuals located in each category on the scale of measurement. * It is a form of Descriptive Statistics, used to summarize and organize a sample so it can be easily understood. It is not used for Inferential Statistics (making jumps from a sample to a population).
Two Essential Components * Every frequency distribution (table, pie chart, histogram) must provide: 1. The set of categories that make up the measurement scale (Possible Scores, denoted as $x$ ). 2. The number of individuals in each category (Frequency, denoted as $f$ ).

Frequency Distribution Tables

Structure * Includes at least two columns: $x$ (scores) and $f$ (frequency). * Columns are usually listed from highest value to lowest, though lowest to highest is acceptable. * Example Data Set: A quiz with scores: $1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5$ . * Table Representation: * $x = 5$ , $f = 1$ * $x = 4$ , $f = 2$ * $x = 3$ , $f = 4$ * $x = 2$ , $f = 5$ * $x = 1$ , $f = 2$
Determining Sample Size ( $n$ ) * To find the total number of scores ( $n$ ) from a table, you add up the frequency column: \n n = \sum f * In the example above: $1 + 2 + 4 + 5 + 2 = 14$ . Thus, $n = 14$ .
Calculating the Sum of Scores ( $\sum x$ ) * Do NOT add the $x$ column directly; that column only lists possible scores, not the actual data. * To find $\sum x$ , multiply each score by its frequency ( $x \times f$ ) and then sum those products. * Example Calculation: * $(5 \times 1) = 5$ * $(4 \times 2) = 8$ * $(3 \times 4) = 12$ * $(2 \times 5) = 10$ * $(1 \times 2) = 2$ * $\sum x = 5 + 8 + 12 + 10 + 2 = 37$

Advanced Table Columns: Proportion and Percentage

Proportion ( $p$ ) * Definition: The fraction of the total group that is associated with a specific score. * Formula: $p = \frac{f}{n}$ * Value Range: Proportions must be between $0$ and $1$ . * Example: If $f = 2$ and $n = 14$ , then $p = \frac{2}{14} \approx 0.14$ . * The sum of the proportion column should equal exactly $1.00$ (though rounding error may result in values between $0.98$ and $1.02$ ).
Percentage ( $\%$ ) * Definition: The proportion expressed as a value out of $100$ . * Formula: $P = (p) \times 100$ * Conversion: Move the decimal point two places to the right. * Example: A proportion of $0.14$ becomes $14\%$ . * The sum of the percentage column should equal $100\%$ .

Grouped Frequency Distribution Tables

When to Use 1. When a variable is continuous. 2. When a discrete variable has a wide range of scores (e.g., a test scored out of $100$ points, creating $101$ possible categories).
Interval Width * Definition: The number of scores included in each category (also called groups, classes, or intervals). * Formula: $\text{Width} = \text{Upper Limit} - \text{Lower Limit} + 1$ * Example: In a group of scores from $0$ to $9$ , the width is $9 - 0 + 1 = 10$ (because you must count the $0$ ).
Four Fundamental Rules for Creating Intervals 1. Uniform Width: All intervals must have exactly the same width. You cannot have one group of $50$ points and others of $10$ points. 2. Multiple of Width: The lower limit of each interval must be a multiple of the interval width. * Example: If the width is $10$ , intervals should start at $0, 10, 20, 30$ , etc. 3. Mutual Exclusivity: A score cannot belong to two intervals. Limits must not overlap (e.g., $0-9$ and $10-19$ , NOT $0-10$ and $10-20$ ). 4. Enclosed Intervals: All intervals between the highest and lowest scores must be listed, even if they have a frequency of $0$ . * Definition of Enclosed: An interval is enclosed if there are intervals with non-zero frequencies both above and below it. * Non-enclosed intervals at the absolute top or bottom of the table (beyond the range of actual data) are optional and usually omitted.