Variability, Z-Tests, and T-Tests

Introduction to Variability and Statistical Thinking - Course Context: Practical sessions in Weeks 2 and 3 focus on refreshing skills in using standard deviation ( $SD$ ) and standard error ( $SE$ ) to conduct $z$ -tests and $t$ -tests. These tests are foundational for understanding more complex statistical procedures in later modules (Topics 2 and 3). - The Necessity of Variability Knowledge: Understanding why we need to measure variability is essential before beginning statistical testing. # Standard Deviation (SD): Foundation of Statistics - Definition: Standard deviation is a number which tells us by how much – on average – scores in a set will differ from the mean. - Key Functions of SD: - It quantifies the amount of variability, "error," or "noise" in a dataset. - It predicts how much any single score is expected to differ from the dataset mean based on chance/noise. - It provides a benchmark to determine if an observed difference between a single score and the group mean is meaningful or just due to chance. - Interpretation: - A low SD indicates very little variability; the mean is a good representation of the data. - A high SD indicates "noisy" or "spread out" data where scores differ significantly from one another. # Detailed Steps for Calculating Standard Deviation - 1. Deviation Calculation: Work out the deviation of each score from the mean (subtract the mean from the score). - Formula: $ext{Deviation} = ext{Score} - ext{Mean}$ - 2. Squaring Deviations: Square each deviation. This ensures all numbers are positive; otherwise, the sum of deviations would always equal zero. - 3. Sum of Squares (SS): Add the squared deviations together. This result is known as the Sum of Squares. - 4. Variance Calculation: - Step 4a (Population): If you have all scores from an entire population, divide the $SS$ by the total number of scores ( $N$ ). This is the population variance. - Step 4b (Sample): If collecting data from a representative sample, divide the $SS$ by the total number of scores minus one ( $N - 1$ ). This is the sample variance. - 5. Square Root: Take the square root of the variance to calculate the standard deviation ( $SD$ ). This "undos" the squaring from step 2. # SD for a Sample of Scores and Degrees of Freedom - The Complication: A sample is a subset of a population and will naturally have slightly less variability. Therefore, calculating sample SD using $N$ would always underestimate population variability. - The Adjustment: To compensate, statisticians divide the $SS$ by $N - 1$ instead of $N$ . This makes the computed $SD$ slightly larger, providing a better estimate of the true population variability. - Degrees of Freedom ( $df$ ): The value $(N - 1)$ is formally known as the degrees of freedom. # Application Example: Impulsivity Test - Dataset: $4, 1, 6, 4, 5$ - Calculation Breakdown: - Mean: $(4 + 1 + 6 + 4 + 5) / 5 = 4$ - Deviations from Mean: $0, -3, 2, 0, 1$ - Squared Deviations: $0, 9, 4, 0, 1$ - Sum of Squares (SS): $14$ - Variance (Sample): $14 / (5 - 1) = 3.5$ - Standard Deviation (SD): $ext{SD} = ext{\sqrt{3.5}} = 1.87$ - Interpretation: The impulsivity scores in the sample differ from the mean by an average of $1.87$ points. This amount reflects "error variability" or "noise" (chance effects, experimental error, individual differences). - Practice Sets: - Set 1: $12, 14, 11, 10, 13$ (Ans: $SD = 1.58$ ) - Set 2: $3, 5, 4, 1, 6$ (Ans: $SD = 1.92$ ) - Set 3: $54, 61, 53, 57, 50$ (Ans: $SD = 4.18$ ) # Z-Scores and Standardization - Definition: A $z$ -score tells us how far away from the mean any particular score is relative to the variability in the sample ( $SD$ ). It is a ratio of the difference from the mean over the $SD$ . - Purpose: We standardise scores to make them comparable and to use standardised tables to determine if a score is significantly different from others. - Z-score Formula: $z = \frac{\text{score} - \text{mean}}{\text{SD}}$ - Probability ( $p$ -value): $z$ -score tables provide the likelihood $(p)$ of obtaining a specific score by chance. - Significance threshold: If p < .05, the score is significantly different from the mean (unlikely to be chance). # One-Tailed vs. Two-Tailed Tests - One-Tailed Test: Measures the probability of a score falling into one specific side (tail) of the distribution. - Example: A $z$ -score of $1.26$ has a one-tailed $p = .1038$ . - Two-Tailed Test: Measures the probability of a score falling into either tail (above or below the mean). - The $p$ -value for a two-tailed test is double that of a one-tailed test. - Example: For $z = 1.26$ , the two-tailed $p = .2076$ . This makes significance harder to achieve than in a one-tailed test. # Standard Error (SE) vs. Standard Deviation (SD) - SD (Standard Deviation): Tells us how much, on average, scores in a set differ from the mean of that set. It measures random error variability within a set. - SE (Standard Error): Tells us how much, on average, sample means ( $M$ ) of a specific size differ from the mean of the larger population ( $\mu$ ). It measures error variability when comparing a sample to a population. - SE Formula: $SE = \sqrt{\frac{\text{Variance}}{N}}$ or $SE = \frac{\text{SD}}{\sqrt{N}}$ - Relationships: $SE$ is directly proportional to population variance and inversely proportional to sample size ( $N$ ). Increasing $N$ decreases $SE$ . # The Z-Test for a Sample of Scores - Purpose: Used to determine if a sample mean is significantly different from a population mean. - Formula: $z = \frac{\text{sample mean} - \text{population mean}}{SE}$ - Critical Values: - Two-tailed critical value: $z = 1.96$ (corresponds to $p = .05$ ). - One-tailed critical value: $z = 1.64$ (corresponds to $p = .05$ ). # Logic of Statistical Difference Tests - Null Hypothesis ( $H_0$ ): Always states that any obtained difference is due solely to error variability/chance. - ¨C44C: $\text{Statistical Value} = \frac{\text{Obtained Difference}}{\text{Difference expected due to error variability}}$ - ¨C45C: The resulting value (e.g., $z, t, F$ ) represents the number of times greater the obtained difference is compared to the expected error difference under the Null Hypothesis. - ¨C46C: We accept p < .05 as significant. This implies a 5% chance of a Type I error (rejecting the null hypothesis when it is actually true). # Introduction to T-Tests - ¨C47C: $z$ -tests require knowing the population variance. In real-life research, we rarely have this information and must use sample variances as estimates, necessitating the move to $t$ -tests. - ¨C48C: Used when data comes from the same participants (within-subjects) or matched pairs. - ¨C49C: Used when data comes from two separate groups of participants (between-subjects). # Related T-Test Calculation - ¨C50C: Find difference scores ( $D$ ) for each participant (e.g., $Score A - Score B$ ). - ¨C51C: Calculate the mean of difference scores ( $\bar{D}$ ). - ¨C52C: Calculate the variance of difference scores ( $Var_{Diff}$ ) using $SS / (N - 1)$ . - ¨C53C: Calculate Standard Error ( $SE$ ): $SE = \sqrt{\frac{Var_{Diff}}{N}}$ - ¨C54C: Calculate $t$ -statistic: $t = \frac{\bar{D}}{SE}$ - ¨C55C: $df = N - 1$ (where $N$ is the number of pairs). # Unrelated T-Test Calculation - ¨C56C: Calculate means ( $M_A, M_B$ ) for both separate groups. - ¨C57C: Calculate variances ( $Var_A, Var_B$ ) for each group using $SS / (n - 1)$ . - ¨C58C: Calculate Pooled Standard Error ( $SE$ ): $SE = \sqrt{\frac{Var_A}{N_A} + \frac{Var_B}{N_B}}$ - ¨C59C: Calculate Observed Difference: $M_A - M_B$ . - ¨C60C: Calculate $t$ -statistic: $t = \frac{M_A - M_B}{SE}$ - ¨C61C: $df = (N_A - 1) + (N_B - 1)$ # Statistical Power and Experimental Design - ¨C62C: Reflects the sensitivity of a test to detect when the Null Hypothesis is untrue. - ¨C63C: Generally more powerful than unrelated designs because they eliminate individual differences. In related designs, participants act as their own controls. - ¨C64C: $SE$ for unrelated tests reflects individual differences + random error; $SE$ for related tests reflects only random error. - ¨C65C: Susceptible to carryover effects, fatigue, practice effects, and participants guessing the hypothesis. These are managed through counterbalancing. - ¨C66C: Not subject to carryover effects; higher degrees of freedom. # Reporting T-Test Results - ¨C67C: Include the test type, means ( $M$ ), standard deviations ( $SD$ ), degrees of freedom ( $df$ ), the $t$ -value, and the $p$ -value. - ¨C68C: "…revealed no significant difference… (M = 65, SD = 13.02) compared to the group… (M = 71.60, SD = 11.19), t(8) = 0.86, p > .05." - Related Example: "…revealed that money spent when not hungry (M = 6.05, SD = 1.30) was significantly less than when hungry (M = 7.16, SD = 1.12), t(4) = 4.39, p < .05." # Worked Solutions for Practice Set 1 - Related T-test Results: - Mean Difference score = $2.00$ - $SS_{diff}$ = $2.00$ - $Var_{diff}$ = $0.50$ - $SE$ = $0.32$ - t-value = $6.25$ - df = $4$ - Critical t-value ( $t_{crit}$ ) = $2.776$ - Unrelated T-test Results: - Mean A = $5.00$ ; Mean B = $3.00$ (Difference = $2.00$ ) - $Var_A$ = $4.00$ ; $Var_B$ = $2.50$ - Pooled SE = $1.14$ - t-value = $1.75$ - df = $8$ - Critical t-value ( $t_{crit}$ ) = $2.306$ # Worked Solutions for Practice Set 2 - Related T-test Results: - Mean Difference score = $2.80$ - $SS_{diff}$ = $10.80$ - $Var_{diff}$ = $2.70$ - $SE$ = $0.73$ - t-value = $3.84$ - df = $4$ - Critical t-value ( $t_{crit}$ ) = $2.776$ - Unrelated T-test Results: - Mean A = $15.80$ ; Mean B = $13.00$ (Difference = $2.80$ ) - $Var_A$ = $6.20$ ; $Var_B$ = $2.50$ - Pooled SE = $1.319$ - t-value = $2.123$ - df = $8$ - Critical t-value ( $t_{crit}$ ) = $2.306$