Dependent Means t Test - Study Notes

Overview

Topic: Dependent Means t Test (paired/within-subjects design) for a one-group setup with two assessments
Core idea: test whether the mean difference between two related measurements differs from zero
Key concepts covered:
- Introduction and use of difference scores
- Research design for within-subjects designs (two assessments per participant)
- How to compute and interpret difference scores
- Hypotheses and models (restricted vs full) and their degrees of freedom
- Model comparison using an F statistic
- Numeric example illustrating computation and interpretation
Practical relevance
- Reduces error variance by accounting for person-to-person differences
- Applicable to pre/post designs, longitudinal designs with two time points, and designs with two conditions for all participants
Data structure considerations
- Longitudinal data vs. two-condition data; each row represents a participant
- Two assessments are separate variables; compute a single difference score per participant
- In software, request descriptives to identify which mean is higher
- Difference scores can be defined as POST − PRE (positive = increase, negative = decrease)
Important caution
- If you compute the difference in the opposite direction, interpretation can be confusing
- For two time points, the difference score approach aligns with the standard null hypothesis of no average change

Introduction and Difference Scores

Difference scores are used for NHST in within-subject designs
Definition: A difference score is the difference in the dependent variable between the two assessments for each participant
- Example: Post-test knowledge − Pre-test knowledge
Each participant has a difference score, which can be averaged across the sample to test the population mean difference
The difference approach can also be used to compute the difference between the means of two related variables
Practical representation (example table concept):
- ID, X1, X2, Difference
- Example rows (from slide):
- 1: 4, 9, 5
- 2: 6, 7, 1
- 3: 7, 7, 0
- 4: 3, 8, 5
- 5: 4, 9, 5

Research Design and Data Structure

Within-subjects designs involve two assessments per participant
Research questions focus on differences between the two assessments
- Are the scores different from each other on average?
Data structure details:
- Each row corresponds to a participant (participant-level data)
- The two assessments are stored as separate variables (columns)
- The analysis proceeds on the derived difference score for each participant
Special note on data handling:
- If longitudinal, compute the difference as POST − PRE
- Positive difference indicates an increase over time; negative indicates a decrease
- Use descriptive statistics to confirm which mean is higher in output

Hypotheses and Models

Null hypothesis (restricted model): the mean difference score μ_D equals 0
- Statistical form: YDi = 0 + ei
- Degrees of freedom under restricted model: N (since no parameter estimated beyond the fixed value 0)
Alternative hypothesis (full model): the mean difference score μ_D is not 0 (two-tailed)
- Statistical form: YDi = μD + e_i
- μ_D is unknown and is estimated from the sample (the sample mean difference score)
OLS perspective
- The best estimate of the population mean difference score is the sample mean difference score:
- This estimate minimizes residuals under the full model
Notation recap
- Yi represents the observed difference for participant i
- Ŷ_Ri represents the predicted value under the restricted model (0)
- ŶFi represents the predicted value under the full model (μ̂D)
Key relationship
- The mean of the difference scores (the average of the Di values) equals the OLS estimate of μ_D
Interpretation
- If μ_D is significantly different from 0, there is evidence that the two assessments yield different average scores

OLS Estimate of μ_D and Degrees of Freedom

OLS estimate of μ_D:
- μ̂D = ar{D} where Di = X1i − X2i (or the chosen order consistent with POST − PRE)
Intuition
- The sample mean difference score is the best single-number summary of the population mean difference under ordinary least squares
Degrees of freedom (df)
- Restricted model: df_R = N
- Full model: df_F = N − 1
Practical implication
- Moving from restricted to full model introduces one estimated parameter (μ_D)
- This changes the residual sum of squares and the df used in F testing

Model Comparisons

Approach
- Compare the restricted model to the full model using a formal F-test
- Steps:
  1) Compute the sum of squared residuals for the restricted model (SSR) 2) Compute the sum of squared residuals for the full model (SSF)
  3) Compute the F statistic from these sums of squares and their degrees of freedom
F statistic formula (typical two-model comparison)
- F = rac{ rac{SSR - SSF}{dfR - dfF} }{ rac{SSF}{dfF} }
Decision rule
- Compare the observed F to the F-distribution with degrees of freedom (dfR − dfF) and df_F
- A small p-value (p ≤ α) leads to rejecting the null hypothesis of μ_D = 0

Numeric Example

Example scenario

Research question: Do children find child-related marital conflict more distressing than non-child-related conflict?
Design: 8 children exposed to two vignettes (child-related vs non-child-related)
Procedure: All 8 children experience both conditions; after each vignette, children rate negative feelings (combined measure)
Data structure: two ratings per participant (Sleepover and Vacation) with a computed Difference Score

Example data (Difference-based representation)

Participants and scores (Negative Feelings): Sleepover vs Vacation; Difference = Sleepover − Vacation
Data:
- 1: Sleepover 6, Vacation 7, Difference −1
- 2: Sleepover 6, Vacation 4, Difference 2
- 3: Sleepover 8, Vacation 6, Difference 2
- 4: Sleepover 8, Vacation 3, Difference 5
- 5: Sleepover 6, Vacation 5, Difference 1
- 6: Sleepover 5, Vacation 6, Difference −1
- 7: Sleepover 4, Vacation 5, Difference −1
- 8: Sleepover 5, Vacation 4, Difference 1
Means (from slide):
- Mean Sleepover = 6
- Mean Vacation = 5
- Mean Difference = 1

Hypotheses and models for the example

Null hypothesis: Child ratings of negative feelings are the same for sleepover and vacation conflicts
- Mean difference μ_D = 0
- Restricted model: YDi = 0 + ei
- Degrees of freedom: df_R = N = 8
Alternative hypothesis: Child ratings differ between sleepover and vacation conflicts
- Mean difference μ_D ≠ 0
- Full model: YDi = μD + e_i
- Estimated mean difference: μ̂_D = 1
- Degrees of freedom: df_F = N − 1 = 7

Computing residuals and sums of squares (illustrative values)

From the example data:
- SSE under restricted model (SS_R) = 38
- SSE under full model (SS_F) = 30
Note: The table in the slides lists residuals and related quantities, but the key values needed for the F-test are SSR and SSF

Computing F

F statistic calculation (as shown on slides):
- F = rac{ rac{SSR - SSF}{dfR - dfF} }{ rac{SSF}{dfF} }
- Substituting:
- F = rac{ (38 - 30) / (8 - 7) }{ 30 / 7 } = rac{8}{30/7} = rac{56}{30}
  \approx 1.87
Result
- Observed F ≈ 1.87 with degrees of freedom F(1, 7)
- p-value: p > .05
Conclusion
- Do not reject the null hypothesis at α = .05
- Interpretation: Children are not distressed differently by child-related versus non-child-related marital conflict in this sample
Additional note from slides
- The slide explicitly states: "Result: Reject the Null" followed by p > .05 in the provided example; the final interpretation given is F(1,7) = 1.87, p > .05, leading to the conclusion that there is no significant difference
- In practice, software will provide the exact p-value for the F-test

Equations and Key Formulas (summary)

Difference score for participant i:
- Di = X{1i} - X_{2i} (or POST − PRE as defined in your study)
Null hypothesis for μ_D:
- H0: \muD = 0
Alternative hypothesis (two-tailed):
- HA: \muD
  eq 0
Restricted model (null):
- Y{Di} = 0 + e_i
Full model (alternative):
- Y{Di} = \\muD + ei
OLS estimate of μ_D:
- \\hat{\mu}_D = \overline{D}
Degrees of freedom:
- Restricted: df_R = N
- Full: df_F = N - 1
Model comparison (F statistic):
- F = rac{ rac{SSR - SSF}{dfR - dfF} }{ rac{SSF}{dfF} }
Example numbers (for the debugged numeric example):
- SSR = 38,\, SSF = 30,\, dfR = 8,\, dfF = 7
- F = rac{ (38 - 30) / (8 - 7) }{ 30 / 7 } = rac{8}{30/7} \approx 1.87
- Conclusion: p > .05; not significant at α = .05

Overview

Hey there! Let's talk about something really useful: the Dependent Means t Test. Ever wonder if there's a real difference when you measure the same group twice? That's exactly what this test helps us figure out!

It's all about testing whether the average difference between two related measurements is truly different from zero. Think of it like comparing 'before' and 'after' scores for the same people.

What are we covering?

How 'difference scores' are key to this whole process.
The research setup for 'within-subjects' designs (that's when each person gets two assessments).
How to calculate and understand these difference scores.
The 'hypotheses' (our statistical questions) and the models we use to test them.
How we compare these models using an F statistic.
We'll even walk through a step-by-step example!

Why is this important?

It's super smart because it helps reduce 'error' by focusing on changes within each person, rather than differences between people.
You'll see this in 'pre/post' studies, in long-term studies with two measurements, or even when everyone tries two different conditions.

Quick heads-up about your data:

Whether your data is 'longitudinal' (over time) or 'two-conditions' (different scenarios), each row in your spreadsheet represents one participant.
Each participant will have two separate scores (e.g., 'Pre' and 'Post'). Your job is to compute a single difference score for each person.
When defining your difference scores, it's often helpful to do POST -\text{PRE}. This way:
- A positive difference means an increase (e.g., skills went up).
- A negative difference means a decrease (e.g., feelings went down).
- Always check your descriptive statistics to see which mean is higher, so your interpretation is clear!

Important caution:

If you calculate the difference in the opposite direction (e.g., PRE -\text{POST}), your interpretation of positive/negative might get flipped, which can be confusing.
For studies with two time points, defining your difference as POST -\text{PRE} perfectly matches our standard idea that the null hypothesis is 'no average change'.

Introduction and Difference Scores

So, when do we use these 'difference scores'? Well, imagine you want to see how much something changes within the same person. Instead of comparing two separate groups, you're looking at pre- and post-measurements from the same individuals. That's where difference scores come in handy!

Exactly what is a difference score? It's simply the change in your dependent variable between the two assessments for each participant.

For example: If you measured Post-test knowledge and Pre-test knowledge, the difference score would be: Post-test knowledge -\text{Pre-test knowledge}.

Each participant ends up with their own unique difference score. We then average all these individual difference scores across your entire sample to see if the population's average difference is something other than zero.

This 'difference approach' is powerful because it boils down the question of two related variables' means into a single, straightforward score.

Let's visualize it! Here's a common way you might see the data:

ID, X1 (Measurement 1), X2 (Measurement 2), Difference (X2 -\text{X1})
Look at these examples:
- 1: 4, 9, 5
- 2: 6, 7, 1
- 3: 7, 7, 0
- 4: 3, 8, 5
- 5: 4, 9, 5

See how 'Participant 1' went from 4 to 9? Their difference score is 5. 'Participant 3' stayed the same (7 to 7), so their difference is 0.

Research Design and Data Structure

What kind of studies benefit from this? It's perfect for 'within-subjects' designs – think 'pre-post' studies. We're asking: 'Did something change for these specific people on average?'

How is the data typically arranged?

Each row in your dataset represents one participant. This is called 'participant-level data'.
The two assessments you collected are stored in separate columns (variables).
The magic happens when you compute a new variable: the difference score for each participant. All your analysis will then focus on this new difference score variable.

A special note on how to handle that data:

If you're looking at changes over time (longitudinal data), it's standard to calculate the difference as POST -\text{PRE}.
If you get a positive difference, it means scores increased over time. A negative difference means they decreased.
Always double-check your initial descriptive statistics to clearly see which mean (Pre or Post) was higher to confirm your difference score interpretation.

Hypotheses and Models

Okay, let's get into the 'what if' scenarios – our hypotheses! These are the questions we're trying to answer statistically.

First, our Null Hypothesis (H0), also known as the 'restricted model', is like saying: 'There's no average change at all!' Or, more formally: the true mean difference score (\muD) in the population is exactly 0.

Statistically, it looks like this: Y{Di} = 0 + ei (meaning each observed difference is just random error around a true difference of zero).
For this model, our degrees of freedom are N, because we're not estimating any parameters; we're just assuming the mean difference is fixed at 0.

Then, we have the Alternative Hypothesis (H*A), or the 'full model'. This is the exciting one, saying: 'There *is* an average change!' Or, the mean difference score (\mu*D) is *not* 0 (this is usually a two-tailed test, meaning it could be higher or lower).

Its statistical form is: Y{Di} = \muD + e*i (meaning each difference comes from a true population mean difference, plus some error).
Here, \muD is an unknown true value that we estimate from our sample data. Our best guess for \muD is our sample's average difference score.

From an Ordinary Least Squares (OLS) perspective:

The best estimate of that true population mean difference score (\mu*D) is simply the *sample mean difference score* (let's call it \overline{D}).
This estimate is fantastic because it's the one that minimizes the 'residuals' (the leftover errors) in our full model.

Let's quickly recap the notation:

Y*{i} is the observed difference score for participant i.
\hat{Y}*{Ri} is the predicted value under the restricted model (which is always 0).
\hat{Y}*{Fi} is the predicted value under the full model (which is our estimated mean difference, \hat{\mu}_D).

Key takeaway:

The average of all your individual difference scores (\overline{D}) is exactly what we use as the OLS estimate for the population mean difference (\hat{\mu}_D).

What's the big picture for interpretation?

If our estimated mean difference \hat{\mu}_D turns out to be statistically 'significant' (different enough from 0), then we have good evidence that those two assessments yielded different average scores.

OLS Estimate of \mu*D and Degrees of Freedom

How do we actually estimate this mysterious population mean difference (\mu*D) from our data?

Our OLS (Ordinary Least Squares) estimate of \muD is simply the sample mean difference: \hat{\mu}_D = \overline{D} where Di = X*{1i} - X ext{_}{2i} (or POST -\text{PRE}, depending on how you defined your difference scores).

It makes intuitive sense, right? The average difference we see in our sample is our best guess for the average difference in the larger population.

Now, about 'degrees of freedom' (df) – these are important for our statistical tests:

For the Restricted Model: df_R = N (where N is your number of participants).
For the Full Model: df_F = N - 1

What's the practical implication here? When we move from the restricted model (where we assume no difference) to the full model (where we estimate the difference), we're introducing one estimated parameter (\hat{\mu}_D). This change affects how we calculate the total 'leftover' variance (sum of squares residuals) and the degrees of freedom we use in our F-tests.

Model Comparisons

How do we decide if our 'average change' is statistically significant? We use a powerful technique: comparing our two models (restricted vs. full) with a formal F-test!

Here are the steps:

First, calculate the total 'sum of squared residuals' for our restricted model (SS*R). This is how much error we have when assuming *no* difference.
Next, calculate the sum of squared residuals for our full model (SS*F). This is how much error we have *after* estimating the mean difference.
Finally, we plug these sums of squares and their degrees of freedom into our F-statistic formula.

Here's the general F-statistic formula for comparing two models:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }

How do we make our decision?

We compare the F-value we just calculated to an F-distribution. This distribution is defined by two values: (dfR - dfF) and df*F.
If the 'p-value' (the probability of seeing a result this extreme if the null hypothesis were true) is small (typically p \le \alpha, like p \le .05), then we 'reject the null hypothesis.' This means we have evidence that the true mean difference (\mu*D) is *not* 0.

Numeric Example

Example scenario

Let's apply this to a real-world question:

Research question: Do children feel more distressed by marital conflict that involves children (like arguments about a sleepover) compared to conflict that doesn't involve children (like arguments about a vacation)?
Design: We have 8 children, and each child is exposed to both kinds of scenarios (vignettes: one child-related, one non-child-related).
Procedure: After each vignette, the children rate their negative feelings (we'll use a combined measure). So, each child gives us two ratings!
Data structure: Each child will have two ratings (we'll call them 'Sleepover' and 'Vacation'), and then we'll compute a 'Difference Score' for each child.

Example data (Difference-based representation)

Here are the participants and their 'Negative Feelings' scores. We'll define Difference = Sleepover - Vacation:

Participants and scores (Negative Feelings): Sleepover vs Vacation; Difference = Sleepover -\text{Vacation}
Data:
- 1: Sleepover 6, Vacation 7, Difference -\text{1}
- 2: Sleepover 6, Vacation 4, Difference 2
- 3: Sleepover 8, Vacation 6, Difference 2
- 4: Sleepover 8, Vacation 3, Difference 5
- 5: Sleepover 6, Vacation 5, Difference 1
- 6: Sleepover 5, Vacation 6, Difference -\text{1}
- 7: Sleepover 4, Vacation 5, Difference -\text{1}
- 8: Sleepover 5, Vacation 4, Difference 1

Let's look at the average scores from this data:

Mean Sleepover = 6
Mean Vacation = 5
Mean Difference = 1 (This is our \overline{D}!)

Hypotheses and models for the example

Based on our example, here are our statistical questions:

Null hypothesis (H*0): We're proposing that children's ratings of negative feelings are *the same* for sleepover-related and vacation-related conflicts. In other words, the true mean difference (\mu*D) equals 0.
- Our restricted model: Y{Di} = 0 + ei
- Our degrees of freedom for this model: df_R = N = 8
Alternative hypothesis (H*A): We're suggesting that children's ratings *do differ* between sleepover and vacation conflicts. So, the true mean difference (\mu*D) is *not* 0.
- Our full model: Y{Di} = \muD + e*i
- Our best guess for the mean difference from this sample: \hat{\mu}_D = 1
- Our degrees of freedom for this model: df_F = N - 1 = 7

Computing residuals and sums of squares (illustrative values)

After all the number crunching (which software usually does for you!), let's say we find these values from our example data:

The sum of squared errors under the restricted model (SS_R) = 38
The sum of squared errors under the full model (SS_F) = 30

(The original table would show individual residuals, but these SS values are what we need for the F-test).

Computing F

Now, let's plug those values into our F-statistic formula:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Substituting our numbers:
F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} =\frac{56}{30} \approx 1.87

What's the result?

Our calculated F-value is approximately 1.87. We compare this to an F-distribution with degrees of freedom F(1, 7).
When we look up the p-value for F(1, 7) = 1.87, we find that p > .05

So, what's our conclusion?

We do not reject the null hypothesis at an alpha level of .05.
Interpretation: Based on this sample, there's no statistically significant evidence to say that children are distressed differently by child-related versus non-child-related marital conflict. It looks like the average difference of 1 could just be due to chance.

(A quick note: If you saw this on a slide, it might have initially said 'Reject the Null' followed by p > .05 but the explanation clarifies that p > .05 means not rejecting the null. In real-world software, you'd get the exact p-value, helping you make this clear decision!)

Equations and Key Formulas (summary)

To wrap it up, here are the essential formulas we discussed:

Difference score for participant i:
- Di = X{1i} - X ext{_}{2i} (or POST -\text{PRE} as you choose to define it)
Null hypothesis for Population Mean Difference (\mu*D):
- H0: \muD = 0
Alternative hypothesis (two-tailed):
- HA: \muD \ne 0
Restricted Model (Null):
- Y{Di} = 0 + ei
Full Model (Alternative):
- Y{Di} = \muD + e*i
OLS estimate of Population Mean Difference (\mu*D):
- \hat{\mu}_D = \overline{D}
Degrees of Freedom:
- Restricted: df_R = N
- Full: df_F = N - 1
Model Comparison (F statistic):
- F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Example Numbers (for our distress study):
- SSR = 38, SSF = 30, dfR = 8, dfF = 7
- F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} \approx 1.87
- Conclusion: p > .05; not significant at \alpha = .05

Hope this helps make the Dependent Means t-Test clearer and more engaging!

Overview

It's all about testing whether the average difference between two related measurements is truly different from zero. Think of it like comparing 'before' and 'after' scores for the same people.

What are we covering?

How 'difference scores' are key to this whole process.
The research setup for 'within-subjects' designs (that's when each person gets two assessments).
How to calculate and understand these difference scores.
The 'hypotheses' (our statistical questions) and the models we use to test them.
How we compare these models using an F statistic.
We'll even walk through a step-by-step example!

Why is this important?

It's super smart because it helps reduce 'error' by focusing on changes within each person, rather than differences between people.
You'll see this in 'pre/post' studies, in long-term studies with two measurements, or even when everyone tries two different conditions.

Quick heads-up about your data:

Whether your data is 'longitudinal' (over time) or 'two-conditions' (different scenarios), each row in your spreadsheet represents one participant.
Each participant will have two separate scores (e.g., 'Pre' and 'Post'). Your job is to compute a single difference score for each person.
When defining your difference scores, it's often helpful to do POST -\text{PRE}. This way:
- A positive difference means an increase (e.g., skills went up).
- A negative difference means a decrease (e.g., feelings went down).
- Always check your descriptive statistics to see which mean is higher, so your interpretation is clear!

Important caution:

If you calculate the difference in the opposite direction (e.g., PRE -\text{POST}), your interpretation of positive/negative might get flipped, which can be confusing.
For studies with two time points, defining your difference as POST -\text{PRE} perfectly matches our standard idea that the null hypothesis is 'no average change'.

Introduction and Difference Scores

Exactly what is a difference score? It's simply the change in your dependent variable between the two assessments for each participant.

For example: If you measured Post-test knowledge and Pre-test knowledge, the difference score would be: Post-test knowledge -\text{Pre-test knowledge}.

This 'difference approach' is powerful because it boils down the question of two related variables' means into a single, straightforward score.

Let's visualize it! Here's a common way you might see the data:

ID, X1 (Measurement 1), X2 (Measurement 2), Difference (X2 -\text{X1})
Look at these examples:
- 1: 4, 9, 5
- 2: 6, 7, 1
- 3: 7, 7, 0
- 4: 3, 8, 5
- 5: 4, 9, 5

See how 'Participant 1' went from 4 to 9? Their difference score is 5. 'Participant 3' stayed the same (7 to 7), so their difference is 0.

Research Design and Data Structure

What kind of studies benefit from this? It's perfect for 'within-subjects' designs – think 'pre-post' studies. We're asking: 'Did something change for these specific people on average?'

How is the data typically arranged?

Each row in your dataset represents one participant. This is called 'participant-level data'.
The two assessments you collected are stored in separate columns (variables).
The magic happens when you compute a new variable: the difference score for each participant. All your analysis will then focus on this new difference score variable.

A special note on how to handle that data:

If you're looking at changes over time (longitudinal data), it's standard to calculate the difference as POST -\text{PRE}.
If you get a positive difference, it means scores increased over time. A negative difference means they decreased.
Always double-check your initial descriptive statistics to clearly see which mean (Pre or Post) was higher to confirm your difference score interpretation.

Hypotheses and Models

Okay, let's get into the 'what if' scenarios – our hypotheses! These are the questions we're trying to answer statistically.

Statistically, it looks like this: Y{Di} = 0 + ei (meaning each observed difference is just random error around a true difference of zero).
For this model, our degrees of freedom are N, because we're not estimating any parameters; we're just assuming the mean difference is fixed at 0.

Its statistical form is: Y{Di} = \muD + e*i (meaning each difference comes from a true population mean difference, plus some error).
Here, \muD is an unknown true value that we estimate from our sample data. Our best guess for \muD is our sample's average difference score.

From an Ordinary Least Squares (OLS) perspective:

The best estimate of that true population mean difference score (\mu*D) is simply the *sample mean difference score* (let's call it \overline{D}).
This estimate is fantastic because it's the one that minimizes the 'residuals' (the leftover errors) in our full model.

Let's quickly recap the notation:

Y*{i} is the observed difference score for participant i.
\hat{Y}*{Ri} is the predicted value under the restricted model (which is always 0).
\hat{Y}*{Fi} is the predicted value under the full model (which is our estimated mean difference, \hat{\mu}_D).

Key takeaway:

The average of all your individual difference scores (\overline{D}) is exactly what we use as the OLS estimate for the population mean difference (\hat{\mu}_D).

What's the big picture for interpretation?

If our estimated mean difference \hat{\mu}_D turns out to be statistically 'significant' (different enough from 0), then we have good evidence that those two assessments yielded different average scores.

OLS Estimate of \mu*D and Degrees of Freedom

How do we actually estimate this mysterious population mean difference (\mu*D) from our data?

Our OLS (Ordinary Least Squares) estimate of \muD is simply the sample mean difference: \hat{\mu}_D = \overline{D} where Di = X*{1i} - X ext{_}{2i} (or POST -\text{PRE}, depending on how you defined your difference scores).

It makes intuitive sense, right? The average difference we see in our sample is our best guess for the average difference in the larger population.

Now, about 'degrees of freedom' (df) – these are important for our statistical tests:

For the Restricted Model: df_R = N (where N is your number of participants).
For the Full Model: df_F = N - 1

Model Comparisons

How do we decide if our 'average change' is statistically significant? We use a powerful technique: comparing our two models (restricted vs. full) with a formal F-test!

Here are the steps:

First, calculate the total 'sum of squared residuals' for our restricted model (SS*R). This is how much error we have when assuming *no* difference.
Next, calculate the sum of squared residuals for our full model (SS*F). This is how much error we have *after* estimating the mean difference.
Finally, we plug these sums of squares and their degrees of freedom into our F-statistic formula.

Here's the general F-statistic formula for comparing two models:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }

How do we make our decision?

We compare the F-value we just calculated to an F-distribution. This distribution is defined by two values: (dfR - dfF) and df*F.
If the 'p-value' (the probability of seeing a result this extreme if the null hypothesis were true) is small (typically p \le \alpha, like p \le .05), then we 'reject the null hypothesis.' This means we have evidence that the true mean difference (\mu*D) is *not* 0.

Numeric Example

Example scenario

Let's apply this to a real-world question:

Research question: Do children feel more distressed by marital conflict that involves children (like arguments about a sleepover) compared to conflict that doesn't involve children (like arguments about a vacation)?
Design: We have 8 children, and each child is exposed to both kinds of scenarios (vignettes: one child-related, one non-child-related).
Procedure: After each vignette, the children rate their negative feelings (we'll use a combined measure). So, each child gives us two ratings!
Data structure: Each child will have two ratings (we'll call them 'Sleepover' and 'Vacation'), and then we'll compute a 'Difference Score' for each child.

Example data (Difference-based representation)

Here are the participants and their 'Negative Feelings' scores. We'll define Difference = Sleepover - Vacation:

Participants and scores (Negative Feelings): Sleepover vs Vacation; Difference = Sleepover -\text{Vacation}
Data:
- 1: Sleepover 6, Vacation 7, Difference -\text{1}
- 2: Sleepover 6, Vacation 4, Difference 2
- 3: Sleepover 8, Vacation 6, Difference 2
- 4: Sleepover 8, Vacation 3, Difference 5
- 5: Sleepover 6, Vacation 5, Difference 1
- 6: Sleepover 5, Vacation 6, Difference -\text{1}
- 7: Sleepover 4, Vacation 5, Difference -\text{1}
- 8: Sleepover 5, Vacation 4, Difference 1

Let's look at the average scores from this data:

Mean Sleepover = 6
Mean Vacation = 5
Mean Difference = 1 (This is our \overline{D}!)

Hypotheses and models for the example

Based on our example, here are our statistical questions:

Null hypothesis (H*0): We're proposing that children's ratings of negative feelings are *the same* for sleepover-related and vacation-related conflicts. In other words, the true mean difference (\mu*D) equals 0.
- Our restricted model: Y{Di} = 0 + ei
- Our degrees of freedom for this model: df_R = N = 8
Alternative hypothesis (H*A): We're suggesting that children's ratings *do differ* between sleepover and vacation conflicts. So, the true mean difference (\mu*D) is *not* 0.
- Our full model: Y{Di} = \muD + e*i
- Our best guess for the mean difference from this sample: \hat{\mu}_D = 1
- Our degrees of freedom for this model: df_F = N - 1 = 7

Computing residuals and sums of squares (illustrative values)

After all the number crunching (which software usually does for you!), let's say we find these values from our example data:

The sum of squared errors under the restricted model (SS_R) = 38
The sum of squared errors under the full model (SS_F) = 30

(The original table would show individual residuals, but these SS values are what we need for the F-test).

Computing F

Now, let's plug those values into our F-statistic formula:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Substituting our numbers:
F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} =\frac{56}{30} \approx 1.87

What's the result?

Our calculated F-value is approximately 1.87. We compare this to an F-distribution with degrees of freedom F(1, 7).
When we look up the p-value for F(1, 7) = 1.87, we find that p > .05

So, what's our conclusion?

We do not reject the null hypothesis at an alpha level of .05.
Interpretation: Based on this sample, there's no statistically significant evidence to say that children are distressed differently by child-related versus non-child-related marital conflict. It looks like the average difference of 1 could just be due to chance.

Equations and Key Formulas (summary)

To wrap it up, here are the essential formulas we discussed:

Difference score for participant i:
- Di = X{1i} - X ext{_}{2i} (or POST -\text{PRE} as you choose to define it)
Null hypothesis for Population Mean Difference (\mu*D):
- H0: \muD = 0
Alternative hypothesis (two-tailed):
- HA: \muD \ne 0
Restricted Model (Null):
- Y{Di} = 0 + ei
Full Model (Alternative):
- Y{Di} = \muD + e*i
OLS estimate of Population Mean Difference (\mu*D):
- \hat{\mu}_D = \overline{D}
Degrees of Freedom:
- Restricted: df_R = N
- Full: df_F = N - 1
Model Comparison (F statistic):
- F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Example Numbers (for our distress study):
- SSR = 38, SSF = 30, dfR = 8, dfF = 7
- F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} \approx 1.87
- Conclusion: p > .05; not significant at \alpha = .05

Hope this helps make the Dependent Means t-Test clearer and more engaging!

Overview

It's all about testing whether the average difference between two related measurements is truly different from zero. Think of it like comparing 'before' and 'after' scores for the same people.

What are we covering?

How 'difference scores' are key to this whole process.
The research setup for 'within-subjects' designs (that's when each person gets two assessments).
How to calculate and understand these difference scores.
The 'hypotheses' (our statistical questions) and the models we use to test them.
How we compare these models using an F statistic.
We'll even walk through a step-by-step example!

Why is this important?

It's super smart because it helps reduce 'error' by focusing on changes within each person, rather than differences between people.
You'll see this in 'pre/post' studies, in long-term studies with two measurements, or even when everyone tries two different conditions.

Quick heads-up about your data:

Whether your data is 'longitudinal' (over time) or 'two-conditions' (different scenarios), each row in your spreadsheet represents one participant.
Each participant will have two separate scores (e.g., 'Pre' and 'Post'). Your job is to compute a single difference score for each person.
When defining your difference scores, it's often helpful to do POST -\text{PRE}. This way:
- A positive difference means an increase (e.g., skills went up).
- A negative difference means a decrease (e.g., feelings went down).
- Always check your descriptive statistics to see which mean is higher, so your interpretation is clear!

Important caution:

If you calculate the difference in the opposite direction (e.g., PRE -\text{POST}), your interpretation of positive/negative might get flipped, which can be confusing.
For studies with two time points, defining your difference as POST -\text{PRE} perfectly matches our standard idea that the null hypothesis is 'no average change'.

Introduction and Difference Scores

Exactly what is a difference score? It's simply the change in your dependent variable between the two assessments for each participant.

For example: If you measured Post-test knowledge and Pre-test knowledge, the difference score would be: Post-test knowledge -\text{Pre-test knowledge}.

This 'difference approach' is powerful because it boils down the question of two related variables' means into a single, straightforward score.

Let's visualize it! Here's a common way you might see the data:

ID, X1 (Measurement 1), X2 (Measurement 2), Difference (X2 -\text{X1})
Look at these examples:
- 1: 4, 9, 5
- 2: 6, 7, 1
- 3: 7, 7, 0
- 4: 3, 8, 5
- 5: 4, 9, 5

See how 'Participant 1' went from 4 to 9? Their difference score is 5. 'Participant 3' stayed the same (7 to 7), so their difference is 0.

Research Design and Data Structure

What kind of studies benefit from this? It's perfect for 'within-subjects' designs – think 'pre-post' studies. We're asking: 'Did something change for these specific people on average?'

How is the data typically arranged?

Each row in your dataset represents one participant. This is called 'participant-level data'.
The two assessments you collected are stored in separate columns (variables).
The magic happens when you compute a new variable: the difference score for each participant. All your analysis will then focus on this new difference score variable.

A special note on how to handle that data:

If you're looking at changes over time (longitudinal data), it's standard to calculate the difference as POST -\text{PRE}.
If you get a positive difference, it means scores increased over time. A negative difference means they decreased.
Always double-check your initial descriptive statistics to clearly see which mean (Pre or Post) was higher to confirm your difference score interpretation.

Hypotheses and Models

Okay, let's get into the 'what if' scenarios – our hypotheses! These are the questions we're trying to answer statistically.

Statistically, it looks like this: Y{Di} = 0 + ei (meaning each observed difference is just random error around a true difference of zero).
For this model, our degrees of freedom are N, because we're not estimating any parameters; we're just assuming the mean difference is fixed at 0.

Its statistical form is: Y{Di} = \muD + e*i (meaning each difference comes from a true population mean difference, plus some error).
Here, \muD is an unknown true value that we estimate from our sample data. Our best guess for \muD is our sample's average difference score.

From an Ordinary Least Squares (OLS) perspective:

The best estimate of that true population mean difference score (\mu*D) is simply the *sample mean difference score* (let's call it \overline{D}).
This estimate is fantastic because it's the one that minimizes the 'residuals' (the leftover errors) in our full model.

Let's quickly recap the notation:

Y*{i} is the observed difference score for participant i.
\hat{Y}*{Ri} is the predicted value under the restricted model (which is always 0).
\hat{Y}*{Fi} is the predicted value under the full model (which is our estimated mean difference, \hat{\mu}_D).

Key takeaway:

The average of all your individual difference scores (\overline{D}) is exactly what we use as the OLS estimate for the population mean difference (\hat{\mu}_D).

What's the big picture for interpretation?

If our estimated mean difference \hat{\mu}_D turns out to be statistically 'significant' (different enough from 0), then we have good evidence that those two assessments yielded different average scores.

OLS Estimate of \mu*D and Degrees of Freedom

How do we actually estimate this mysterious population mean difference (\mu*D) from our data?

Our OLS (Ordinary Least Squares) estimate of \muD is simply the sample mean difference: \hat{\mu}_D = \overline{D} where Di = X*{1i} - X ext{_}{2i} (or POST -\text{PRE}, depending on how you defined your difference scores).

It makes intuitive sense, right? The average difference we see in our sample is our best guess for the average difference in the larger population.

Now, about 'degrees of freedom' (df) – these are important for our statistical tests:

For the Restricted Model: df_R = N (where N is your number of participants).
For the Full Model: df_F = N - 1

Model Comparisons

How do we decide if our 'average change' is statistically significant? We use a powerful technique: comparing our two models (restricted vs. full) with a formal F-test!

Here are the steps:

First, calculate the total 'sum of squared residuals' for our restricted model (SS*R). This is how much error we have when assuming *no* difference.
Next, calculate the sum of squared residuals for our full model (SS*F). This is how much error we have *after* estimating the mean difference.
Finally, we plug these sums of squares and their degrees of freedom into our F-statistic formula.

Here's the general F-statistic formula for comparing two models:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }

How do we make our decision?

We compare the F-value we just calculated to an F-distribution. This distribution is defined by two values: (dfR - dfF) and df*F.
If the 'p-value' (the probability of seeing a result this extreme if the null hypothesis were true) is small (typically p \le \alpha, like p \le .05), then we 'reject the null hypothesis.' This means we have evidence that the true mean difference (\mu*D) is *not* 0.

Numeric Example

Example scenario

Let's apply this to a real-world question:

Research question: Do children feel more distressed by marital conflict that involves children (like arguments about a sleepover) compared to conflict that doesn't involve children (like arguments about a vacation)?
Design: We have 8 children, and each child is exposed to both kinds of scenarios (vignettes: one child-related, one non-child-related).
Procedure: After each vignette, the children rate their negative feelings (we'll use a combined measure). So, each child gives us two ratings!
Data structure: Each child will have two ratings (we'll call them 'Sleepover' and 'Vacation'), and then we'll compute a 'Difference Score' for each child.

Example data (Difference-based representation)

Here are the participants and their 'Negative Feelings' scores. We'll define Difference = Sleepover - Vacation:

Participants and scores (Negative Feelings): Sleepover vs Vacation; Difference = Sleepover -\text{Vacation}
Data:
- 1: Sleepover 6, Vacation 7, Difference -\text{1}
- 2: Sleepover 6, Vacation 4, Difference 2
- 3: Sleepover 8, Vacation 6, Difference 2
- 4: Sleepover 8, Vacation 3, Difference 5
- 5: Sleepover 6, Vacation 5, Difference 1
- 6: Sleepover 5, Vacation 6, Difference -\text{1}
- 7: Sleepover 4, Vacation 5, Difference -\text{1}
- 8: Sleepover 5, Vacation 4, Difference 1

Let's look at the average scores from this data:

Mean Sleepover = 6
Mean Vacation = 5
Mean Difference = 1 (This is our \overline{D}!)

Hypotheses and models for the example

Based on our example, here are our statistical questions:

Null hypothesis (H*0): We're proposing that children's ratings of negative feelings are *the same* for sleepover-related and vacation-related conflicts. In other words, the true mean difference (\mu*D) equals 0.
- Our restricted model: Y{Di} = 0 + ei
- Our degrees of freedom for this model: df_R = N = 8
Alternative hypothesis (H*A): We're suggesting that children's ratings *do differ* between sleepover and vacation conflicts. So, the true mean difference (\mu*D) is *not* 0.
- Our full model: Y{Di} = \muD + e*i
- Our best guess for the mean difference from this sample: \hat{\mu}_D = 1
- Our degrees of freedom for this model: df_F = N - 1 = 7

Computing residuals and sums of squares (illustrative values)

After all the number crunching (which software usually does for you!), let's say we find these values from our example data:

The sum of squared errors under the restricted model (SS_R) = 38
The sum of squared errors under the full model (SS_F) = 30

(The original table would show individual residuals, but these SS values are what we need for the F-test).

Computing F

Now, let's plug those values into our F-statistic formula:

F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Substituting our numbers:
F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} =\frac{56}{30} \approx 1.87

What's the result?

Our calculated F-value is approximately 1.87. We compare this to an F-distribution with degrees of freedom F(1, 7).
When we look up the p-value for F(1, 7) = 1.87, we find that p > .05

So, what's our conclusion?

We do not reject the null hypothesis at an alpha level of .05.
Interpretation: Based on this sample, there's no statistically significant evidence to say that children are distressed differently by child-related versus non-child-related marital conflict. It looks like the average difference of 1 could just be due to chance.

Equations and Key Formulas (summary)

To wrap it up, here are the essential formulas we discussed:

Difference score for participant i:
- Di = X{1i} - X ext{_}{2i} (or POST -\text{PRE} as you choose to define it)
Null hypothesis for Population Mean Difference (\mu*D):
- H0: \muD = 0
Alternative hypothesis (two-tailed):
- HA: \muD \ne 0
Restricted Model (Null):
- Y{Di} = 0 + ei
Full Model (Alternative):
- Y{Di} = \muD + e*i
OLS estimate of Population Mean Difference (\mu*D):
- \hat{\mu}_D = \overline{D}
Degrees of Freedom:
- Restricted: df_R = N
- Full: df_F = N - 1
Model Comparison (F statistic):
- F =\frac{\frac{SSR - SSF}{dfR - dfF} }{ \frac{SSF}{dfF} }
Example Numbers (for our distress study):
- SSR = 38, SSF = 30, dfR = 8, dfF = 7
- F =\frac{ (38 - 30) / (8 - 7) }{ 30 / 7 } =\frac{8}{30/7} \approx 1.87
- Conclusion: p > .05; not significant at \alpha = .05

Hope this helps make the Dependent Means t-Test clearer and more engaging!