Statistics Chapter 2: Frequency Distributions and Table Construction of Tables
Categorization of Variables: Discrete vs. Continuous
Discrete Variables * Definition: A discrete variable consists of separate, indivisible categories. No intermediate values can exist between two adjacent or neighboring categories. * Example — Siblings: A person may have siblings or siblings. It is impossible to have or siblings; the change occurs in whole, separate steps. * Rule of Thumb: Discrete variables are typically things you count (e.g., "how many"). This includes the number of classes taken at a college (). * Nominal Variables as Discrete: Even though nominal variables are non-numeric labels, they are still discrete. * Example — Occupation: On a survey with options (including "Other"), a respondent must choose one box. They cannot provide a value halfway between 'Plumber' and 'Electrician'. * Example — College Major: A student is typically classified as a certain major (Psychology, Business, Sociology). Even if they take classes in multiple fields, the administrative system requires them to pick one distinct category. * Example — Gender: Even in surveys with options such as "Non-binary," "Prefer not to say," or "Other," these remain discrete categories because the respondent must select a specific box.
Continuous Variables * Definition: A variable that has an infinite number of possible values between any two observed points. It is divisible into an infinite number of fractional parts. * Rule of Thumb: Continuous variables are typically things you measure (e.g., "how much"). * Example — Height: A person does not jump from to inches instantly. They grow through every possible fraction of an inch in between. * Example — Weight: Weight gain occurs slowly and steadily. One does not simply weigh pounds and then suddenly weigh ; you pass through an infinite number of weights to reach the next pound. * Example — Distance: Driving a mile involves passing through an infinite number of infinitesimal distances to reach the next mile marker.
The Principle of Infinity in Continuous Variables
- Theoretical Impossibility of Exact Matches * Technically, it is impossible for two individuals to have the exact same score on a truly continuous variable if measured with infinite precision. * Example — Height Tie-Breaking: If two friends are both roughly inches tall, a more precise ruler might show one is and the other is . Because there is an infinite number of decimal places, you can always measure further (to the millionths, trillionths, etc.) to break a tie. * Practical Implication: Because we cannot measure to infinity, we must round to the nearest whole number or decimal (e.g., the nearest whole inch).
Real Limits and Rounding for Intervals
- Real Limits * Real limits are a way of defining rounding boundaries for continuous variables when they are grouped into intervals. * Apparent Limits: The values visible on a table (e.g., to ). * Lower Real Limit: The lowest value that rounds up to the category. It is found by subtracting from the lower apparent limit. * Upper Real Limit: The highest value that rounds down into the category. It is found by adding to the upper apparent limit (or more accurately, repeating). * Example: For an interval of to inches: * The lower real limit is inches. * The upper real limit is inches (specifically as close to as possible without hitting it). * Constraint: The same value cannot be in two categories. For instance, cannot be both the upper limit of one group and the lower limit of the next; one must round up while the other rounds down.
Frequency Distributions: Definitions and Purpose
Definition * A frequency distribution is an organized tabling or graphing of the number of individuals located in each category on the scale of measurement. * It is a form of Descriptive Statistics, used to summarize and organize a sample so it can be easily understood. It is not used for Inferential Statistics (making jumps from a sample to a population).
Two Essential Components * Every frequency distribution (table, pie chart, histogram) must provide: 1. The set of categories that make up the measurement scale (Possible Scores, denoted as ). 2. The number of individuals in each category (Frequency, denoted as ).
Frequency Distribution Tables
Structure * Includes at least two columns: (scores) and (frequency). * Columns are usually listed from highest value to lowest, though lowest to highest is acceptable. * Example Data Set: A quiz with scores: . * Table Representation: * , * , * , * , * ,
Determining Sample Size () * To find the total number of scores () from a table, you add up the frequency column: \n n = \sum f * In the example above: . Thus, .
Calculating the Sum of Scores () * Do NOT add the column directly; that column only lists possible scores, not the actual data. * To find , multiply each score by its frequency () and then sum those products. * Example Calculation: * * * * * *
Advanced Table Columns: Proportion and Percentage
Proportion () * Definition: The fraction of the total group that is associated with a specific score. * Formula: * Value Range: Proportions must be between and . * Example: If and , then . * The sum of the proportion column should equal exactly (though rounding error may result in values between and ).
Percentage () * Definition: The proportion expressed as a value out of . * Formula: * Conversion: Move the decimal point two places to the right. * Example: A proportion of becomes . * The sum of the percentage column should equal .
Grouped Frequency Distribution Tables
When to Use 1. When a variable is continuous. 2. When a discrete variable has a wide range of scores (e.g., a test scored out of points, creating possible categories).
Interval Width * Definition: The number of scores included in each category (also called groups, classes, or intervals). * Formula: * Example: In a group of scores from to , the width is (because you must count the ).
Four Fundamental Rules for Creating Intervals 1. Uniform Width: All intervals must have exactly the same width. You cannot have one group of points and others of points. 2. Multiple of Width: The lower limit of each interval must be a multiple of the interval width. * Example: If the width is , intervals should start at , etc. 3. Mutual Exclusivity: A score cannot belong to two intervals. Limits must not overlap (e.g., and , NOT and ). 4. Enclosed Intervals: All intervals between the highest and lowest scores must be listed, even if they have a frequency of . * Definition of Enclosed: An interval is enclosed if there are intervals with non-zero frequencies both above and below it. * Non-enclosed intervals at the absolute top or bottom of the table (beyond the range of actual data) are optional and usually omitted.