Lecture Notes on ANOVA, Variables, Sampling, and Scales
Core concepts in psychological statistics (from the lecture)
Purpose of standardized data analysis
- One-way ANOVA provides a standardized way to analyze data and communicate results; helps detect when data are inappropriate or methods are wrong.
- It’s a tool for consistency and clarity in reporting findings.
Key terms: variables, data, dataset, datum
- Variable: any factor that can vary (must have at least two levels).
- Data: measurements/observations of a variable.
- Datum (singular): a single observation, often denoted by $x$.
- Dataset: collection of all data points for a study.
- Example terms used in class: ages, self-esteem, etc.; age can be treated in different ways depending on scales of measurement.
Scales of measurement (4 types)
- Nominal: categories with no intrinsic order; e.g., eye color (blue, brown, hazel).
- Ordinal: categories with a meaningful order, but not equal intervals; e.g., race placement (1st, 2nd, 3rd).
- Interval: equal intervals between values, but no absolute zero; e.g., temperature in degrees Celsius/Fahrenheit (no true zero).
- Ratio: equal intervals and an absolute zero; e.g., exam scores, height, weight.
- Continuous vs discrete
- Continuous: interval or ratio data; can have decimals (e.g., bank balance $550.46).
- Discrete: cannot have fractions (e.g., census reports like average kids per household can be misleading; discrete values).
- Practical note: the scale limits which statistics you can compute (e.g., means require appropriate scale).
Data representations and notation
- Population parameter (Greek): e.g., population mean
\mu - Sample statistic (English): e.g., sample mean \bar{x} or M in APA style.
- Population vs sample
- Population: the entire group of interest.
- Sample: a subset from the population used to infer about the population.
- Sampling error: the discrepancy between a sample statistic and the population parameter; even a representative sample will differ from the population.
- Notation relationships
- Population mean: \mu
- Sample mean: \bar{x} or M
- Population size: N
- Sample size: n
- Why formulas differ between descriptive and inferential stats: formulas for population descriptions use parameters; formulas for samples use statistics.
- Population parameter (Greek): e.g., population mean
Descriptive vs inferential statistics
- Descriptive statistics: describe the data you have (e.g., mean age of the class, central tendency, spread).
- Inferential statistics: use a sample to infer about the population; account for sampling error and generalize beyond the observed data.
- Denominators in many tests quantify sampling variability (the amount of difference you’d expect by chance) before evaluating the observed effect of the manipulated variable.
Study designs: experimental vs correlational
- Experimental design
- Manipulates an independent variable (IV) and controls conditions/levels.
- Random assignment is key to achieving comparable groups and supporting causal inferences.
- Allows closer inference to a cause-effect claim (A causes B) though not perfect due to potential errors.
- Correlational design
- Examines relationships between variables without manipulation of the IV.
- No random assignment, no manipulation; can reveal associations but not causation.
- Quasi-experimental designs
- Involve manipulation-like conditions but lack random assignment (e.g., using age groups as groups).
Independent variable, levels, and dependent variable (with the sleep example)
- Independent variable (IV): what the researcher manipulates or controls (e.g., sleep amount).
- Levels/conditions: the different amounts or categories of the IV (e.g., 0 hours vs 9.25 hours).
- Dependent variable (DV): the outcome measured (e.g., exam score).
- Sleep study example
- IV: amount of sleep; two levels: 0 hours and 9.25 hours (two groups).
- DV: score on the exam.
- If you convert to a correlational design, you’d record actual sleep hours and exam scores without assigning sleep duration.
- Random assignment
- In experiments, participants are randomly assigned to IV level groups to control for preexisting differences.
- In correlational designs, there is no random assignment and no manipulation of sleep length.
Population, sample, parameters, and statistics
- Population of interest: the entire group you want to learn about.
- Sample: a subset drawn from the population; used to infer about the population.
- Population parameter (Greek): e.g., \mu for population mean.
- Sample statistic (English): e.g., \bar{x} or M for sample mean.
- Sampling error: the unavoidable difference between the sample statistic and the population parameter due to not observing the entire population.
- Why this matters for inference: you compare observed differences to what you’d expect from sampling error to determine if an effect is likely real.
True/false style ideas from the lecture (concept checks)
- Most research studies rely on samples because the population cannot be fully observed; differences between sample and population are not necessarily “systematic errors” but expected due to sampling.
- All research designs have an independent variable in principle; in correlational work, the predictor is the independent variable (or IV-like construct) and the outcome is the dependent variable, though terminology varies.
- Only experiments can, under ideal conditions, allow causal inferences; correlational/quasi designs cannot definitively prove causation.
Notation for data and what you’ll see in SPSS
- Scores in a study are often denoted by x (or x_i for individual scores).
- The total number of scores (participants) is N (population) or n (sample).
- In many reports, a group means are denoted as \bar{x} or M; a population mean is \mu.
- When reporting relationships, you’ll see x and y for two variables in a correlation.
Summation and order of operations (practice question from the lecture)
- Summation notation is used to sum across data points, e.g., \sumi f(xi).
- The rule demonstrated: Summation is done after parentheses, squaring, multiplication, or division; it’s done before other addition or subtraction.
- Example practice (from the lecture): Given a set of scores, determine the operation order to compute something like \sum (x_i^2) + 7 or similar.
- The instructor suggested the correct approach in the example was: square each score, then perform the summation, and finally add 47 (as described in the session): i.e., compute \left(\sumi xi^2\right) + 47 before any additional additions.
- In-class prompt example answer discussion: “square each score, sum those squared scores, then add 47” (the exact numbers from the slide may vary).
Practical example: Sleep study and representativeness
- A study with 75 high school boys as the sample.
- Question: Is this a representative sample of the population of interest?
- Answer discussed: Not necessarily representative unless the population of interest is defined as all high school boys and the sample reflects that population; representativeness depends on how the population is defined and how the sample was drawn.
- Emphasis on methods coursework: representativeness is a key consideration when generalizing from sample to population.
Key takeaways for exam prep
- Remember the four scales of measurement and their implications for statistical analyses.
- Distinguish between population parameters and sample statistics; understand sampling error.
- Know the difference between descriptive and inferential statistics, and when to apply each.
- Be clear on experimental vs correlational designs, and the role of random assignment and quasi-IVs.
- Be able to interpret the concepts of IV, levels, DV, and the idea that replication and proper sampling are essential for valid inferences.
- Practice identifying when a variable is nominal, ordinal, interval, or ratio, and what that means for calculations like means and standard deviations.
- Expect questions about order of operations in summation notation and be able to parse expressions like \sum (x_i^2) + 47.
Summary of philosophical/practical implications discussed
- In psychology, we typically talk in terms of probability rather than proof; we can say something is more or less likely, not definitively proven.
- Causation is inferred only under strong design controls (e.g., random assignment) and must be interpreted cautiously even then, given possible confounds and sampling error.
- The choice of measurement scale constrains what analyses you can perform and what kinds of conclusions you can draw.
Quick reference points for studying
- Independent variable (IV): manipulated by the researcher; levels/conditions denote different groups.
- Dependent variable (DV): measured outcome.
- Population parameter: \mu; Sample statistic: \bar{x} or M.
- Big N vs little n: population size vs sample size.
- Four scales: nominal, ordinal, interval, ratio.
- Continuous vs discrete: decimals allowed vs fixed counts only.
- Recall: Summation order and the impact of sampling error on inference.
Final reminder from the lecture
- The datasets used in class are simplified for hand calculations; real-world datasets, especially in psychology, can be much larger and require software like SPSS for analysis.