Lecture Notes on ANOVA, Variables, Sampling, and Scales

Core concepts in psychological statistics (from the lecture)

  • Purpose of standardized data analysis

    • One-way ANOVA provides a standardized way to analyze data and communicate results; helps detect when data are inappropriate or methods are wrong.
    • It’s a tool for consistency and clarity in reporting findings.
  • Key terms: variables, data, dataset, datum

    • Variable: any factor that can vary (must have at least two levels).
    • Data: measurements/observations of a variable.
    • Datum (singular): a single observation, often denoted by $x$.
    • Dataset: collection of all data points for a study.
    • Example terms used in class: ages, self-esteem, etc.; age can be treated in different ways depending on scales of measurement.
  • Scales of measurement (4 types)

    • Nominal: categories with no intrinsic order; e.g., eye color (blue, brown, hazel).
    • Ordinal: categories with a meaningful order, but not equal intervals; e.g., race placement (1st, 2nd, 3rd).
    • Interval: equal intervals between values, but no absolute zero; e.g., temperature in degrees Celsius/Fahrenheit (no true zero).
    • Ratio: equal intervals and an absolute zero; e.g., exam scores, height, weight.
    • Continuous vs discrete
    • Continuous: interval or ratio data; can have decimals (e.g., bank balance $550.46).
    • Discrete: cannot have fractions (e.g., census reports like average kids per household can be misleading; discrete values).
    • Practical note: the scale limits which statistics you can compute (e.g., means require appropriate scale).
  • Data representations and notation

    • Population parameter (Greek): e.g., population mean
      \mu
    • Sample statistic (English): e.g., sample mean \bar{x} or M in APA style.
    • Population vs sample
    • Population: the entire group of interest.
    • Sample: a subset from the population used to infer about the population.
    • Sampling error: the discrepancy between a sample statistic and the population parameter; even a representative sample will differ from the population.
    • Notation relationships
    • Population mean: \mu
    • Sample mean: \bar{x} or M
    • Population size: N
    • Sample size: n
    • Why formulas differ between descriptive and inferential stats: formulas for population descriptions use parameters; formulas for samples use statistics.
  • Descriptive vs inferential statistics

    • Descriptive statistics: describe the data you have (e.g., mean age of the class, central tendency, spread).
    • Inferential statistics: use a sample to infer about the population; account for sampling error and generalize beyond the observed data.
    • Denominators in many tests quantify sampling variability (the amount of difference you’d expect by chance) before evaluating the observed effect of the manipulated variable.
  • Study designs: experimental vs correlational

    • Experimental design
    • Manipulates an independent variable (IV) and controls conditions/levels.
    • Random assignment is key to achieving comparable groups and supporting causal inferences.
    • Allows closer inference to a cause-effect claim (A causes B) though not perfect due to potential errors.
    • Correlational design
    • Examines relationships between variables without manipulation of the IV.
    • No random assignment, no manipulation; can reveal associations but not causation.
    • Quasi-experimental designs
    • Involve manipulation-like conditions but lack random assignment (e.g., using age groups as groups).
  • Independent variable, levels, and dependent variable (with the sleep example)

    • Independent variable (IV): what the researcher manipulates or controls (e.g., sleep amount).
    • Levels/conditions: the different amounts or categories of the IV (e.g., 0 hours vs 9.25 hours).
    • Dependent variable (DV): the outcome measured (e.g., exam score).
    • Sleep study example
    • IV: amount of sleep; two levels: 0 hours and 9.25 hours (two groups).
    • DV: score on the exam.
    • If you convert to a correlational design, you’d record actual sleep hours and exam scores without assigning sleep duration.
    • Random assignment
    • In experiments, participants are randomly assigned to IV level groups to control for preexisting differences.
    • In correlational designs, there is no random assignment and no manipulation of sleep length.
  • Population, sample, parameters, and statistics

    • Population of interest: the entire group you want to learn about.
    • Sample: a subset drawn from the population; used to infer about the population.
    • Population parameter (Greek): e.g., \mu for population mean.
    • Sample statistic (English): e.g., \bar{x} or M for sample mean.
    • Sampling error: the unavoidable difference between the sample statistic and the population parameter due to not observing the entire population.
    • Why this matters for inference: you compare observed differences to what you’d expect from sampling error to determine if an effect is likely real.
  • True/false style ideas from the lecture (concept checks)

    • Most research studies rely on samples because the population cannot be fully observed; differences between sample and population are not necessarily “systematic errors” but expected due to sampling.
    • All research designs have an independent variable in principle; in correlational work, the predictor is the independent variable (or IV-like construct) and the outcome is the dependent variable, though terminology varies.
    • Only experiments can, under ideal conditions, allow causal inferences; correlational/quasi designs cannot definitively prove causation.
  • Notation for data and what you’ll see in SPSS

    • Scores in a study are often denoted by x (or x_i for individual scores).
    • The total number of scores (participants) is N (population) or n (sample).
    • In many reports, a group means are denoted as \bar{x} or M; a population mean is \mu.
    • When reporting relationships, you’ll see x and y for two variables in a correlation.
  • Summation and order of operations (practice question from the lecture)

    • Summation notation is used to sum across data points, e.g., \sumi f(xi).
    • The rule demonstrated: Summation is done after parentheses, squaring, multiplication, or division; it’s done before other addition or subtraction.
    • Example practice (from the lecture): Given a set of scores, determine the operation order to compute something like \sum (x_i^2) + 7 or similar.
    • The instructor suggested the correct approach in the example was: square each score, then perform the summation, and finally add 47 (as described in the session): i.e., compute \left(\sumi xi^2\right) + 47 before any additional additions.
    • In-class prompt example answer discussion: “square each score, sum those squared scores, then add 47” (the exact numbers from the slide may vary).
  • Practical example: Sleep study and representativeness

    • A study with 75 high school boys as the sample.
    • Question: Is this a representative sample of the population of interest?
    • Answer discussed: Not necessarily representative unless the population of interest is defined as all high school boys and the sample reflects that population; representativeness depends on how the population is defined and how the sample was drawn.
    • Emphasis on methods coursework: representativeness is a key consideration when generalizing from sample to population.
  • Key takeaways for exam prep

    • Remember the four scales of measurement and their implications for statistical analyses.
    • Distinguish between population parameters and sample statistics; understand sampling error.
    • Know the difference between descriptive and inferential statistics, and when to apply each.
    • Be clear on experimental vs correlational designs, and the role of random assignment and quasi-IVs.
    • Be able to interpret the concepts of IV, levels, DV, and the idea that replication and proper sampling are essential for valid inferences.
    • Practice identifying when a variable is nominal, ordinal, interval, or ratio, and what that means for calculations like means and standard deviations.
    • Expect questions about order of operations in summation notation and be able to parse expressions like \sum (x_i^2) + 47.
  • Summary of philosophical/practical implications discussed

    • In psychology, we typically talk in terms of probability rather than proof; we can say something is more or less likely, not definitively proven.
    • Causation is inferred only under strong design controls (e.g., random assignment) and must be interpreted cautiously even then, given possible confounds and sampling error.
    • The choice of measurement scale constrains what analyses you can perform and what kinds of conclusions you can draw.
  • Quick reference points for studying

    • Independent variable (IV): manipulated by the researcher; levels/conditions denote different groups.
    • Dependent variable (DV): measured outcome.
    • Population parameter: \mu; Sample statistic: \bar{x} or M.
    • Big N vs little n: population size vs sample size.
    • Four scales: nominal, ordinal, interval, ratio.
    • Continuous vs discrete: decimals allowed vs fixed counts only.
    • Recall: Summation order and the impact of sampling error on inference.
  • Final reminder from the lecture

    • The datasets used in class are simplified for hand calculations; real-world datasets, especially in psychology, can be much larger and require software like SPSS for analysis.