Variables, Scales of Measurement, and Implications for Analysis
Mutually Exclusive and Collectively Exhaustive (MECE)
Two key concepts from probability theory: mutually exclusive and collectively exhaustive.
Mutually exclusive: two outcomes cannot occur at the same time. If A occurs, B cannot, and vice versa. Formally, P(A ∩ B) = 0.
Collectively exhaustive: the list of outcomes covers all possible possibilities. For a partition {A, B} of the sample space, P(A ∪ B) = 1.
Coin toss example: outcomes Heads (H) or Tails (T) are mutually exclusive and collectively exhaustive. You know the result must be either H or T, with no third outcome.
Implication for variables: any variable you encounter is either qualitative (categorical) or quantitative (numeric). The pair is MECE: they are mutually exclusive and collectively exhaustive, covering all possibilities for how a variable can be classified.
Probability notation reminder:
If A and B are disjoint (mutually exclusive), then P(A ∪ B) = P(A) + P(B).
For a fair coin, P(H) = P(T) = 1/2 and P(H ∪ T) = 1.
Two Basic Types of Variables: Categorical vs Numeric
A variable is an attribute that varies across individuals in a dataset.
Two broad types:
Categorical (qualitative) variables: provide categories or groups; observations fall into subgroups.
Numeric (quantitative) variables: provide numeric values; observations are numbers and often support arithmetic.
Examples and intuition:
Categorical: smoking status (smoker vs non-smoker), gender (male vs female), color names, country of residence, phone numbers (despite looking numeric, they function as categories).
Numeric: age, income, number of children, height, weight, price per unit, production quantity.
Special note on phone numbers:
Although they look numeric, phone numbers are typically treated as a categorical variable (nominal) because the numbers themselves do not carry meaningful arithmetic; each person can have a unique value, which creates many groups but no meaningful magnitude relation.
The picture of a dataset example:
A classroom with 40 students; the color favorite is X, a categorical variable, divided into subgroups like blue, red, green, yellow.
Age is an example of a numeric variable (ages 18, 19, etc.).
Why this distinction matters:
Different statistical methods and visualizations apply to categorical vs numeric variables.
Some methods require numeric outcomes; using a categorical outcome in those methods can yield misleading results if not appropriate.
Categorical Subtypes: Nominal vs Ordinal
Categorical variables split the sample into subgroups (categories).
Nominal variables:
Categories have no intrinsic order or ranking.
Examples: colors (red, green, blue, yellow), countries (Canada, Brazil, Algeria).
Operations: you can test equality (are two observations in the same category?) but you cannot order categories or compare magnitudes.
In the color example, any ordering (blue, green, red, yellow) is arbitrary and not meaningful for analysis.
Ordinal variables:
Categories have a meaningful order or ranking.
Examples: education level (less than high school, high school, some college, bachelor's, master's, PhD).
You can order categories and compare magnitudes (e.g., PhD > Master’s > Bachelor's > Some College > High School > Less than High School).
Arithmetic operations like addition/subtraction are not typically meaningful between ordinal categories, but comparison (greater/less than) is meaningful.
Illustrative contrasts:
Nominal: favorite color with four categories (red, green, blue, yellow). You cannot say one color is larger or more than another.
Ordinal: education level with six ordered categories. You can say PhD is higher than Masters, which is higher than Bachelor's, etc.
Frequently cited nominal examples:
Countries (Canada, Brazil, Algeria) where the alphabetical order is irrelevant; no meaningful magnitude relationship.
The takeaway: Nominal vs ordinal is about whether there is a meaningful order among categories. Only ordinal provides an inherent ranking; nominal does not.
Numeric Subtypes: Interval vs Ratio
Numeric variables provide numbers and allow arithmetic; they fall into two subtypes:
Interval variables: numeric differences are meaningful, but there is no true zero representing the absence of the attribute.
Ratio variables: numeric values with a true zero that represents the absence of the attribute; allows meaningful interpretation of ratios.
True zero concept:
True zero means the value zero indicates absence of the measured attribute. If true zero exists, the variable is on a ratio scale and allows multiplication/division.
If zero is merely a placeholder or an arbitrary point (not absence), the variable is interval and ratios are not meaningful.
Key rule:
If a variable has a true zero, it is a ratio variable.
If a variable does not have a true zero, it is an interval variable.
Examples of ratio variables (have a true zero):
Age: 0 years means the absence of age; 40 years is twice as old as 20 years. Uses for ratio operations like multiplication/division.
Price (per gallon): 0 dollars per gallon would mean absence of price; 4 is twice 2; you can interpret ratios like 4/2 = 2.
Quantity produced: 0 units produced means no production; 500 is five times 100.
Weight, height, etc. (positive values, with a meaningful zero).
Examples of interval variables (no true zero):
Temperature in Fahrenheit or Celsius: 0 degrees does not mean the absence of temperature; you can add/subtract differences (e.g., 90 - 30 = 60), but ratios are not interpretable (e.g., 90 is not twice as hot as 45).
Temperature scale examples:
Fahrenheit: 90°F vs 30°F difference is 60 degrees; ratio like 90/30 is not meaningful.
Celsius: 50°C vs 25°C difference is 25 degrees; ratio 50/25 is not meaningful.
Practical note on negative values:
Interval scales can have negative values (e.g., temperatures below a reference point), whereas ratio scales typically do not have negative values because negative quantities of things like weight or count are not meaningful.
Summary of the four scales (mutually exclusive and collectively exhaustive):
Nominal (categorical, no order)
Ordinal (categorical, with order)
Interval (numeric, no true zero)
Ratio (numeric, with true zero)
Intuition on complexity:
As you move from nominal to ratio, the set of meaningful operations grows (more informative scales).
Putting It All Together: Why It Matters for Analysis
This taxonomy (scale of measurement) guides which statistical methods and graphs are appropriate.
Categorical variables:
Graphs: pie charts and bar charts (with counts or proportions).
Some specialized charts for categorical data.
Numeric variables:
Graphs: histograms (for distribution), scatter plots (relationships), box plots, etc.
Some charts (bar charts) can be used for numeric data but with different rules; histograms are typically preferred.
Dependent/Outcome variable considerations:
Many advanced methods (ANOVA, linear regression, t-tests) require the dependent (outcome) variable to be numeric (a ratio or interval scale).
Using a categorical outcome in regression often yields output that may not be valid or interpretable; some programs may not warn you, leading to biased or meaningless conclusions.
Practical implications:
Before applying a statistical method, identify the variable type to ensure the method is appropriate.
Choose visualizations that respect the scale (e.g., histograms for numeric data; bar/pie charts for categorical data).
Understand that some measures (means, standard deviations) are only meaningful for interval/ratio data; for ordinal data, medians and nonparametric approaches are often more appropriate.
Quick Reference and Formulas
Mutually exclusive and collectively exhaustive (MECE) in probability:
A ∩ B = ∅ (mutually exclusive)
A ∪ B covers all outcomes (collectively exhaustive), so P(A ∪ B) = 1 for the full space.
Arithmetic mean (example given):
Example context: average of ages 18 and 19 is .
Ordinal example: highest education levels arranged in a natural order (e.g., PhD > Masters > Bachelor's > Some College > High School > Less than High School).
Nominal example: color categories (red, green, blue, yellow) have no natural order.
Interval vs Ratio decision rule (true zero):
If the zero value represents the absence of the attribute, the variable is ratio (allows meaningful ratios).
If the zero value is arbitrary and does not denote absence, the variable is interval (ratios are not meaningful).
Examples of ratio variables: age, price, quantity produced, weight, height.
Examples of interval variables: temperature scales (°F, °C) where zero is not the absence of temperature.
Key Takeaways for Exam Preparation
Always classify a variable as one of the four scales: Nominal, Ordinal, Interval, Ratio.
Remember the defining tests:
Nominal vs Ordinal: presence of meaningful order (ordinal) vs none (nominal).
Interval vs Ratio: true zero presence (ratio) vs absence of true zero (interval).
Use the right graph for the right data type:
Categorical: pie charts, bar charts.
Numeric: histograms, scatter plots; use histograms for distribution, scatter plots for relationships.
Be cautious when applying statistical methods: many methods require numeric dependent variables; misclassifying data can lead to biased or invalid conclusions.
Summary of Examples Used
Qualitative/categorical: coloring preferences (red, green, blue, yellow); smoking status (smoker vs non-smoker); gender; country of residence.
Nominal example with countries: Canada, Brazil, Algeria (counts across 30 observations in the example).
Ordinal example: education levels (never completed high school, high school, some college, bachelor's, master's, PhD) with an illustrative distribution adding to 100.
Numeric examples (interval vs ratio):
Age (ratio): 20, 19, 18, 21; ratio interpretation (e.g., 40 is twice 20).
Price per gallon (ratio): $2, $3.50, $4.00, $2.25; differences (1.50) and ratios (4/2 = 2).
Quantity produced (ratio): 500 vs 100; ratio 5.
Temperature (interval): 90°F, 30°F, 45°F, 37°F; differences meaningful, ratios not.
Practical connection: the choice of variable type informs the appropriate method and visualization, impacting the validity of inferences.
Final Note
The taxonomy of variables (nominal, ordinal, interval, ratio) is foundational for statistics education. It determines what operations are meaningful, what graphs to use, and which statistical techniques are appropriate. Understanding true zero and the distinction between interval and ratio is especially critical for correct interpretation of numbers and for avoiding misuses such as interpreting temperature as a ratio scale.