Statistics in Psychology — Comprehensive Study Notes (Unit 1)

Population vs. Sample

  • Population: all people of interest for a study. Example: all college students in the United States or all students at Saint Philip's College (SPC).

  • Sample: a subset drawn from the population for study. Must be a good representation of the population to generalize results.

  • Why we use samples: realistic studies cannot test every member of a population (budgets, time, practicality).

  • Three basic criteria for a good sample:

    • Representativeness: includes the diversity of the population (e.g., genders, races/ethnicities, ages, SES).

    • Randomness: every member of the population has an equal chance of being selected; avoid handpicking to avoid bias.

    • Adequate size: large enough to generalize, not too small to avoid skew from outliers.

  • Example: sampling US college students where 97 out of 100 are female would not be representative of a population that includes multiple genders; representativeness is key for generalization.

Data, Variables, and Measurements

  • Data: information gathered from observations or measurements in a study.

  • Variable: any characteristic that differs between individuals; can be manipulated (independent) or measured (dependent).

  • Operational definition: how a variable is defined for measurement; converts abstract concepts (e.g., hunger, aggression) into measurable quantities.

  • Data set: the collection of measurements or observations from a study.

  • Measurements vs scores: individual measurements = scores; the entire collection = data set.

  • Relationship to methods: variables and measurements underpin how we apply statistical methods.

Descriptive vs. Inferential Statistics

  • Descriptive statistics: summarize, organize, and describe data (e.g., mean, median, mode, range, standard deviation).

  • Inferential statistics: allow us to draw generalizations about populations from samples; address sampling error and uncertainty.

  • Core goal: extend findings from a sample to a population with quantified uncertainty.

  • Core problem: sampling error—the discrepancy between a sample statistic and the population parameter.

  • Key concepts to learn later (as mentioned in lecture): normal distribution, z scores, correlation vs. causation, significance, data sets, ordinal vs. nominal variables, etc.

Population Parameters vs. Sample Statistics

  • Population parameter: a numerical value that describes a population (e.g., mean μ, standard deviation σ).

  • Sample statistic: a numerical value calculated from a sample (e.g., sample mean ar{x}, sample standard deviation ss).

  • Relationship: sample statistics are used to estimate population parameters.

  • Why this matters: because we cannot measure every member of a population, we rely on sample statistics to infer population values.

  • Example: if the average income of all investment bankers in NYC is μ, but we can only sample 500 bankers and get a sample mean ar{x}, we use ar{x} to estimate μ.

  • Inference from samples relies on representativeness and size to minimize error.

Sampling Error and Estimation

  • Sampling error: the natural discrepancy between a sample statistic and the population parameter it estimates.

  • No sample is perfect; different samples yield different statistics.

  • To reduce sampling error, we use well-designed samples and statistical methods (hypothesis testing, confidence intervals, etc.).

  • Standard error of the mean (illustrative):

    • Population level: SE_{ar{X}} = \frac{\sigma}{\sqrt{n}}

    • Sample level (when σ is unknown): SE_{ar{X}} = \frac{s}{\sqrt{n}}

  • Repeated sampling idea: take many random samples and look at the distribution of their statistics; this helps us estimate population parameters more accurately.

  • Concept of power and sample size: larger samples generally reduce sampling error and increase the likelihood of detecting true effects (statistical power).

Descriptive Statistics in Practice

  • Common descriptive measures:

    • Mean: xˉ=1n<em>i=1nx</em>i\bar{x} = \frac{1}{n} \sum<em>{i=1}^n x</em>i

    • Median: middle value when data are ordered

    • Mode: most frequent value

    • Range: R=x<em>maxx</em>minR = x<em>{\max} - x</em>{\min}

    • Standard deviation: s=1n1<em>i=1n(x</em>ixˉ)2s = \sqrt{\frac{1}{n-1} \sum<em>{i=1}^n (x</em>i - \bar{x})^2}

  • Use cases: summarize thousands or millions of scores; identify outliers; observe distribution patterns.

  • Why descriptive stats alone are not enough: to answer questions about relationships and effects, we need inferential methods.

  • Preview of future topics: normal distribution, z-scores, variables, correlation vs. causation, and significance.

Inferential Statistics: Making Inferences from Samples

  • Inferential statistics: methods that let us study samples and generalize to populations.

  • Core problem they address: sampling error and how to quantify confidence in population estimates.

  • Key questions addressed: Is observed difference due to manipulation or chance? How confident are we in our inference?

  • Hypothesis testing and confidence intervals are tools to address these questions (to be covered in depth later).

Correlational Studies vs. Experimental Studies

  • Correlational studies:

    • Observe two variables as they occur naturally; no manipulation.

    • Example: wake-up time and GPA; scatter plot may show a pattern but not causation.

    • Limitation: correlation does not imply causation; many extraneous variables (confounds) can explain associations.

    • Example extraneous variables: caffeine use, breakfast, sleep quality, mental health, work schedules.

  • Extraneous/Confounding variables:

    • Extraneous variables are unintended factors that could influence results.

    • Also called confounds or compound variables.

    • Example: caffeine might drive higher alertness, not wake time per se.

  • Experimental studies:

    • Involve manipulation of one variable (independent variable) to observe effects on another (dependent variable).

    • Offer a path to causal inferences when well controlled.

    • Key features: manipulation, control, random assignment, and careful standardization.

  • Why experiments help with causality:

    • By holding all other factors constant and changing only the IV, differences in the DV can be more confidently attributed to the manipulation.

    • However, good experimental design is essential to rule out remaining extraneous variables.

Independent vs. Dependent Variables, and Related Terms

  • Independent variable (IV): the variable that is deliberately changed/manipulated (predictor).

  • Dependent variable (DV): the variable that is measured (outcome).

  • In a simple two-group study: IV defines group membership; DV is the measured score.

  • Terminology:

    • Predictor variable: another term for the independent variable.

    • Outcome variable: another term for the dependent variable.

  • Experimental vs control groups:

    • Experimental group: receives the treatment/manipulation.

    • Control group: baseline group used for comparison.

  • Example (video game study):

    • IV: type of video game (violent vs. nonviolent).

    • DV: number of aggressive behaviors observed.

    • Control group: nonviolent game group; Experimental group: violent game group.

Experimental Design and Control of Extraneous Variables

  • Manipulation: intentionally changing the IV to observe effect on DV.

  • Control: keeping other variables constant to isolate the IV’s effect.

  • Possible controls in the video game study:

    • Same room, same TV size, same volume, same duration of play, similar age range, and diverse gender/ethnicity in both groups.

  • Participant variables vs. environmental variables:

    • Participant variables: individual differences like age, gender, intelligence, biases.

    • Environmental variables: conditions of the experimental setup (time of day, room temperature, equipment quality).

  • The goal: minimize differences that could confound the interpretation of whether the IV caused changes in the DV.

  • Ethical concern example: poorly designed or fraudulent studies (e.g., vaccine-autism claim) can cause harm; replication and ethical conduct are critical.

Data Types and Measurement Scales

  • Discrete vs. Continuous variables:

    • Discrete: distinct categories with no intermediate values (e.g., dice roll results 1–6; number of children).

    • Continuous: infinite possible values between observed values (e.g., weight, height).

    • Rounding/boundaries: in continuous data, scores may be rounded into intervals for ease of reporting (e.g., 149.7 pounds rounded to 150 pounds).

  • Nominal scales (categorical):

    • Categories with names but no intrinsic order or magnitude (e.g., major in college: math, English, psychology).

  • Ordinal scales:

    • Ordered categories with a meaningful order but not necessarily equal intervals (e.g., bronze, silver, gold; small, medium, large).

  • Interval vs. Ratio scales:

    • Interval scales: ordered with equal intervals, but zero is arbitrary (e.g., Fahrenheit temperature; 0°F is not absence of temperature).

    • Ratio scales: have a true zero representing absence (e.g., weight in pounds; zero means no weight).

    • Important distinction: ratio scales allow meaningful statements about ratios (e.g., twice as heavy), whereas interval scales do not permit such ratio interpretations.

  • Practical note: understanding scale type determines which statistical methods are appropriate.

Operational Definitions and Measurement Decisions

  • Variables like hunger or aggression require clear definitions to be measurable.

  • Example (rats): define hunger as days without food or observed behavior toward food; define aggression as a specific observable behavior (squeaks, bites, etc.).

  • Transform qualitative concepts into quantitative data to enable analysis.

Practical Takeaways and Real-World Relevance

  • Why statistics matter in psychology:

    • Organize, summarize, and interpret data from experiments, correlational studies, and observations.

    • Enable decisions about whether observed differences are likely due to chance or reflect real effects.

    • Provide a framework to discuss reliability, validity, and generalizability of findings.

  • Real-world relevance:

    • Describing groups (descriptive) vs. explaining/estimating population parameters (inferential).

    • Understanding sampling error helps interpret how confident we should be in study conclusions.

    • Ethical considerations and rigorous design prevent misleading claims and promote scientific integrity.

Quick References to Notation and Formulas

  • Descriptive statistics:

    • Mean: xˉ=1n<em>i=1nx</em>i\bar{x} = \frac{1}{n} \sum<em>{i=1}^n x</em>i

    • Range: R=x<em>maxx</em>minR = x<em>{\max} - x</em>{\min}

    • Standard deviation: s=1n1<em>i=1n(x</em>ixˉ)2s = \sqrt{\frac{1}{n-1} \sum<em>{i=1}^n (x</em>i - \bar{x})^2}

  • Z-scores (standardization):

    • Population: z=Xμσz = \frac{X - \mu}{\sigma}

    • Sample: z=Xxˉsz = \frac{X - \bar{x}}{s}

  • Standard error of the mean: SE<em>Xˉ=σnorSE</em>Xˉ=snSE<em>{\bar{X}} = \frac{\sigma}{\sqrt{n}}\quad\text{or}\quad SE</em>{\bar{X}} = \frac{s}{\sqrt{n}}

  • Parameter vs. statistic notation:

    • Population parameter: μ,σ2,\mu, \sigma^2, …

    • Sample statistic: xˉ,s2,\bar{x}, s^2, …

  • Hypothesis testing framework (conceptual):

    • Null hypothesis: H0H_0

    • Alternative hypothesis: H<em>1H<em>1 or H</em>aH</em>a

    • Significance and p-values: probability that data would occur under H<em>0H<em>0; small p-values suggest rejecting H</em>0H</em>0 in favor of H1H_1 (thresholds like 0.05 are common, though context matters).

  • Two-sample comparison (illustrative t-test):

    • t=xˉ<em>1xˉ</em>2s<em>p1/n</em>1+1/n2t = \frac{\bar{x}<em>1 - \bar{x}</em>2}{s<em>p \sqrt{1/n</em>1 + 1/n_2}}

    • Pooled standard deviation (two-sample case): s<em>p=(n</em>11)s<em>12+(n</em>21)s<em>22n</em>1+n22s<em>p = \sqrt{\frac{(n</em>1 - 1)s<em>1^2 + (n</em>2 - 1)s<em>2^2}{n</em>1 + n_2 - 2}}

  • Confidence intervals (for means; simple form):

    • xˉ±tsn\bar{x} \pm t^* \frac{s}{\sqrt{n}} (where tt^* is the appropriate critical value from the t-distribution).

Classroom Notes on Emphasis and Ethics

  • Expect to learn about normal distribution, z-scores, and distinguishing between descriptive and inferential statistics in upcoming units.

  • Emphasis on building vocabulary to communicate about data (population vs. sample, parameter vs. statistic, independent vs. dependent variables, extraneous variables, experimental vs. correlational designs).

  • Ethical considerations: use of data, replication, and avoiding fabrication or manipulation of results; the vaccine-autism example illustrates the consequences of poor design and unethical practices.

  • The instructor plans to provide additional lecture videos and worked examples to reinforce concepts and help with homework; live attendance remains important for discussion and practice.

Summary of Key Ideas to Remember

  • Statistics in psychology is a toolkit to organize, summarize, and interpret data from empirical studies.

  • Descriptive statistics describe data; inferential statistics generalize findings to populations and assess the role of sampling error.

  • Population vs. sample concepts guide how we estimate population parameters using sample statistics.

  • Variables can be discrete or continuous and measured on nominal, ordinal, interval, or ratio scales; operational definitions convert abstract concepts into measurable data.

  • Correlation shows associations but cannot prove causation; experiments use manipulation and control to infer causal relationships, but require careful design to rule out confounds.

  • Extraneous/compounding variables threaten internal validity; good experimental design seeks to control for these.

  • Reliability is aided by adequate sample size and representative sampling; ethics and reproducibility are essential to credible science.