Statistics Chapter 2: Frequency Distributions and Table Construction of Tables

Categorization of Variables: Discrete vs. Continuous

  • Discrete Variables     * Definition: A discrete variable consists of separate, indivisible categories. No intermediate values can exist between two adjacent or neighboring categories.     * Example — Siblings: A person may have 22 siblings or 33 siblings. It is impossible to have 2.12.1 or 2.42.4 siblings; the change occurs in whole, separate steps.     * Rule of Thumb: Discrete variables are typically things you count (e.g., "how many"). This includes the number of classes taken at a college (JPJP).     * Nominal Variables as Discrete: Even though nominal variables are non-numeric labels, they are still discrete.         * Example — Occupation: On a survey with 4040 options (including "Other"), a respondent must choose one box. They cannot provide a value halfway between 'Plumber' and 'Electrician'.         * Example — College Major: A student is typically classified as a certain major (Psychology, Business, Sociology). Even if they take classes in multiple fields, the administrative system requires them to pick one distinct category.         * Example — Gender: Even in surveys with options such as "Non-binary," "Prefer not to say," or "Other," these remain discrete categories because the respondent must select a specific box.

  • Continuous Variables     * Definition: A variable that has an infinite number of possible values between any two observed points. It is divisible into an infinite number of fractional parts.     * Rule of Thumb: Continuous variables are typically things you measure (e.g., "how much").     * Example — Height: A person does not jump from 5050 to 5151 inches instantly. They grow through every possible fraction of an inch in between.     * Example — Weight: Weight gain occurs slowly and steadily. One does not simply weigh 170170 pounds and then suddenly weigh 171171; you pass through an infinite number of weights to reach the next pound.     * Example — Distance: Driving a mile involves passing through an infinite number of infinitesimal distances to reach the next mile marker.

The Principle of Infinity in Continuous Variables

  • Theoretical Impossibility of Exact Matches     * Technically, it is impossible for two individuals to have the exact same score on a truly continuous variable if measured with infinite precision.     * Example — Height Tie-Breaking: If two friends are both roughly 7070 inches tall, a more precise ruler might show one is 70.1570.15 and the other is 70.15870.158. Because there is an infinite number of decimal places, you can always measure further (to the millionths, trillionths, etc.) to break a tie.     * Practical Implication: Because we cannot measure to infinity, we must round to the nearest whole number or decimal (e.g., the nearest whole inch).

Real Limits and Rounding for Intervals

  • Real Limits     * Real limits are a way of defining rounding boundaries for continuous variables when they are grouped into intervals.     * Apparent Limits: The values visible on a table (e.g., 6464 to 6565).     * Lower Real Limit: The lowest value that rounds up to the category. It is found by subtracting 0.50.5 from the lower apparent limit.     * Upper Real Limit: The highest value that rounds down into the category. It is found by adding 0.50.5 to the upper apparent limit (or more accurately, 0.4999...0.4999... repeating).     * Example: For an interval of 6464 to 6565 inches:         * The lower real limit is 63.563.5 inches.         * The upper real limit is 65.565.5 inches (specifically as close to 65.565.5 as possible without hitting it).     * Constraint: The same value cannot be in two categories. For instance, 65.565.5 cannot be both the upper limit of one group and the lower limit of the next; one must round up while the other rounds down.

Frequency Distributions: Definitions and Purpose

  • Definition     * A frequency distribution is an organized tabling or graphing of the number of individuals located in each category on the scale of measurement.     * It is a form of Descriptive Statistics, used to summarize and organize a sample so it can be easily understood. It is not used for Inferential Statistics (making jumps from a sample to a population).

  • Two Essential Components     * Every frequency distribution (table, pie chart, histogram) must provide:         1. The set of categories that make up the measurement scale (Possible Scores, denoted as xx).         2. The number of individuals in each category (Frequency, denoted as ff).

Frequency Distribution Tables

  • Structure     * Includes at least two columns: xx (scores) and ff (frequency).     * Columns are usually listed from highest value to lowest, though lowest to highest is acceptable.     * Example Data Set: A quiz with scores: 1,1,2,2,2,2,2,3,3,3,3,4,4,51, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5.     * Table Representation:         * x=5x = 5, f=1f = 1         * x=4x = 4, f=2f = 2         * x=3x = 3, f=4f = 4         * x=2x = 2, f=5f = 5         * x=1x = 1, f=2f = 2

  • Determining Sample Size (nn)     * To find the total number of scores (nn) from a table, you add up the frequency column: \n    n = \sum f     * In the example above: 1+2+4+5+2=141 + 2 + 4 + 5 + 2 = 14. Thus, n=14n = 14.

  • Calculating the Sum of Scores (x\sum x)     * Do NOT add the xx column directly; that column only lists possible scores, not the actual data.     * To find x\sum x, multiply each score by its frequency (x×fx \times f) and then sum those products.     * Example Calculation:         * (5×1)=5(5 \times 1) = 5         * (4×2)=8(4 \times 2) = 8         * (3×4)=12(3 \times 4) = 12         * (2×5)=10(2 \times 5) = 10         * (1×2)=2(1 \times 2) = 2         * x=5+8+12+10+2=37\sum x = 5 + 8 + 12 + 10 + 2 = 37

Advanced Table Columns: Proportion and Percentage

  • Proportion (pp)     * Definition: The fraction of the total group that is associated with a specific score.     * Formula: p=fnp = \frac{f}{n}     * Value Range: Proportions must be between 00 and 11.     * Example: If f=2f = 2 and n=14n = 14, then p=2140.14p = \frac{2}{14} \approx 0.14.     * The sum of the proportion column should equal exactly 1.001.00 (though rounding error may result in values between 0.980.98 and 1.021.02).

  • Percentage (%\%)     * Definition: The proportion expressed as a value out of 100100.     * Formula: P=(p)×100P = (p) \times 100     * Conversion: Move the decimal point two places to the right.     * Example: A proportion of 0.140.14 becomes 14%14\%.     * The sum of the percentage column should equal 100%100\%.

Grouped Frequency Distribution Tables

  • When to Use     1. When a variable is continuous.     2. When a discrete variable has a wide range of scores (e.g., a test scored out of 100100 points, creating 101101 possible categories).

  • Interval Width     * Definition: The number of scores included in each category (also called groups, classes, or intervals).     * Formula: Width=Upper LimitLower Limit+1\text{Width} = \text{Upper Limit} - \text{Lower Limit} + 1     * Example: In a group of scores from 00 to 99, the width is 90+1=109 - 0 + 1 = 10 (because you must count the 00).

  • Four Fundamental Rules for Creating Intervals     1. Uniform Width: All intervals must have exactly the same width. You cannot have one group of 5050 points and others of 1010 points.     2. Multiple of Width: The lower limit of each interval must be a multiple of the interval width.         * Example: If the width is 1010, intervals should start at 0,10,20,300, 10, 20, 30, etc.     3. Mutual Exclusivity: A score cannot belong to two intervals. Limits must not overlap (e.g., 090-9 and 101910-19, NOT 0100-10 and 102010-20).     4. Enclosed Intervals: All intervals between the highest and lowest scores must be listed, even if they have a frequency of 00.         * Definition of Enclosed: An interval is enclosed if there are intervals with non-zero frequencies both above and below it.         * Non-enclosed intervals at the absolute top or bottom of the table (beyond the range of actual data) are optional and usually omitted.