knowt logo

PSYB07H3 - Final Exam Prompts

1. Why do you divide by the expected frequencies or probabilities in chi-squared tests?

In a chi-squared test, dividing by the expected frequencies standardizes the differences between observed and expected values. This accounts for the size of the expected frequencies, ensuring the test statistic isn't biased by large or small expected counts. Without this division, differences in categories with higher expected frequencies would disproportionately influence the test statistic.


2. Why do the critical values for a chi-squared distribution get larger as the degrees of freedom increase?

Chi-squared critical values grow with degrees of freedom because the chi-squared distribution becomes more spread out as the number of categories increases. Higher degrees of freedom mean there are more independent comparisons, so the threshold for significance must increase to maintain the same significance level (e.g., α=0.05\alpha = 0.05α=0.05).

In contrast, for ttt and FFF-distributions, critical values decrease with larger sample sizes (or degrees of freedom) because these distributions approach the normal distribution, where smaller variability is expected in sampling.


3. How do outliers affect the results of a t-test, chi-squared test, and correlation/regression analysis?

  • t-Test: Outliers can inflate the standard deviation, reducing statistical power and potentially masking true differences. Alternatively, they can create false significance if they heavily skew the mean.

  • Chi-Squared Test: Outliers are less relevant since chi-squared tests rely on categorical data and frequency counts, but extreme discrepancies in observed vs. expected values may distort the test statistic.

  • Correlation/Regression: Outliers can strongly affect the slope and correlation coefficient, exaggerating or masking relationships. Residuals will show these deviations.


4. Explain how overgeneralization can affect your predicted values in a regression.

Overgeneralization occurs when a regression model is used to predict values beyond the range of the observed data (extrapolation). The model assumes the same linear relationship holds outside the data range, which can lead to inaccurate predictions if the actual relationship changes or becomes non-linear.


5. Why do we need to test for linearity in correlation and regression analysis? Explain and discuss the concept of residuals in your answer.

Testing for linearity ensures that the assumption of a linear relationship between variables is valid. If the relationship is non-linear, the correlation coefficient (Pearson’s rrr) or regression model may misrepresent the data.

  • Residuals: These are the differences between observed and predicted values. Examining residual plots helps identify non-linearity, as non-random patterns in residuals indicate violations of the linearity assumption.


6. What are the similarities and/or differences between Pearson’s r and Cohen’s d?

  • Similarities: Both measure effect sizes, providing standardized metrics to quantify the strength of relationships or differences.

  • Differences:

    • Pearson’s rrr: Measures the strength and direction of a linear relationship between two continuous variables (−1≤r≤1-1 \leq r \leq 1−1≤r≤1).

    • Cohen’s ddd: Measures the standardized mean difference between two groups, focusing on magnitude rather than relationship.


7. Explain the similarities and/or differences between the chi-squared, t, and F distributions.

  • Similarities: All three are sampling distributions used in hypothesis testing and depend on degrees of freedom.

  • Differences:

    • Chi-squared: Used for categorical data, always positive, and asymmetric.

    • t: Symmetric and used for comparing means in small samples.

    • F: Asymmetric, used in variance analysis, and calculated as the ratio of two variances.


8. What is the difference between an ANOVA and an independent samples t-test?

  • Independent samples t-test: Compares the means of two groups.

  • ANOVA: Compares the means of three or more groups. While the t-test is limited to two groups, ANOVA generalizes to multiple groups by analyzing variance.


9. Explain why F = 1 in an ANOVA when the null hypothesis is true.

Under the null hypothesis, the between-group variance (systematic variance) equals the within-group variance (random error). Since F=Between-group varianceWithin-group varianceF = \frac{\text{Between-group variance}}{\text{Within-group variance}}F=Within-group varianceBetween-group variance​, the FFF-ratio equals 1.


10. Why do we need to test for homogeneity of variances when conducting an ANOVA?

Homogeneity of variances ensures that the groups being compared have similar variability. Violations can lead to misleading FFF-ratios, as the test assumes equal variance to partition variance correctly between and within groups.


11. Describe the two ways in which you estimate the population variance in an ANOVA.

  1. Between-group variance: Based on the variability of group means relative to the overall mean. Biased if the null hypothesis is false.

  2. Within-group variance: Based on the variability of individual scores within each group. Generally unbiased.


12. Are ANOVAs one-tailed or two-tailed tests?

ANOVAs are inherently two-tailed because they test for any difference among group means, regardless of direction. The test statistic only considers variance, not the sign of differences.


13. Explain the difference between parametric and non-parametric tests. When is it appropriate versus inappropriate to use each type of test?

  • Parametric tests: Assume normal distribution and specific conditions (e.g., homogeneity of variances, interval/ratio data). Appropriate when assumptions are met.

  • Non-parametric tests: Do not assume normality and are suitable for ordinal data or when assumptions of parametric tests are violated.

LE

PSYB07H3 - Final Exam Prompts

1. Why do you divide by the expected frequencies or probabilities in chi-squared tests?

In a chi-squared test, dividing by the expected frequencies standardizes the differences between observed and expected values. This accounts for the size of the expected frequencies, ensuring the test statistic isn't biased by large or small expected counts. Without this division, differences in categories with higher expected frequencies would disproportionately influence the test statistic.


2. Why do the critical values for a chi-squared distribution get larger as the degrees of freedom increase?

Chi-squared critical values grow with degrees of freedom because the chi-squared distribution becomes more spread out as the number of categories increases. Higher degrees of freedom mean there are more independent comparisons, so the threshold for significance must increase to maintain the same significance level (e.g., α=0.05\alpha = 0.05α=0.05).

In contrast, for ttt and FFF-distributions, critical values decrease with larger sample sizes (or degrees of freedom) because these distributions approach the normal distribution, where smaller variability is expected in sampling.


3. How do outliers affect the results of a t-test, chi-squared test, and correlation/regression analysis?

  • t-Test: Outliers can inflate the standard deviation, reducing statistical power and potentially masking true differences. Alternatively, they can create false significance if they heavily skew the mean.

  • Chi-Squared Test: Outliers are less relevant since chi-squared tests rely on categorical data and frequency counts, but extreme discrepancies in observed vs. expected values may distort the test statistic.

  • Correlation/Regression: Outliers can strongly affect the slope and correlation coefficient, exaggerating or masking relationships. Residuals will show these deviations.


4. Explain how overgeneralization can affect your predicted values in a regression.

Overgeneralization occurs when a regression model is used to predict values beyond the range of the observed data (extrapolation). The model assumes the same linear relationship holds outside the data range, which can lead to inaccurate predictions if the actual relationship changes or becomes non-linear.


5. Why do we need to test for linearity in correlation and regression analysis? Explain and discuss the concept of residuals in your answer.

Testing for linearity ensures that the assumption of a linear relationship between variables is valid. If the relationship is non-linear, the correlation coefficient (Pearson’s rrr) or regression model may misrepresent the data.

  • Residuals: These are the differences between observed and predicted values. Examining residual plots helps identify non-linearity, as non-random patterns in residuals indicate violations of the linearity assumption.


6. What are the similarities and/or differences between Pearson’s r and Cohen’s d?

  • Similarities: Both measure effect sizes, providing standardized metrics to quantify the strength of relationships or differences.

  • Differences:

    • Pearson’s rrr: Measures the strength and direction of a linear relationship between two continuous variables (−1≤r≤1-1 \leq r \leq 1−1≤r≤1).

    • Cohen’s ddd: Measures the standardized mean difference between two groups, focusing on magnitude rather than relationship.


7. Explain the similarities and/or differences between the chi-squared, t, and F distributions.

  • Similarities: All three are sampling distributions used in hypothesis testing and depend on degrees of freedom.

  • Differences:

    • Chi-squared: Used for categorical data, always positive, and asymmetric.

    • t: Symmetric and used for comparing means in small samples.

    • F: Asymmetric, used in variance analysis, and calculated as the ratio of two variances.


8. What is the difference between an ANOVA and an independent samples t-test?

  • Independent samples t-test: Compares the means of two groups.

  • ANOVA: Compares the means of three or more groups. While the t-test is limited to two groups, ANOVA generalizes to multiple groups by analyzing variance.


9. Explain why F = 1 in an ANOVA when the null hypothesis is true.

Under the null hypothesis, the between-group variance (systematic variance) equals the within-group variance (random error). Since F=Between-group varianceWithin-group varianceF = \frac{\text{Between-group variance}}{\text{Within-group variance}}F=Within-group varianceBetween-group variance​, the FFF-ratio equals 1.


10. Why do we need to test for homogeneity of variances when conducting an ANOVA?

Homogeneity of variances ensures that the groups being compared have similar variability. Violations can lead to misleading FFF-ratios, as the test assumes equal variance to partition variance correctly between and within groups.


11. Describe the two ways in which you estimate the population variance in an ANOVA.

  1. Between-group variance: Based on the variability of group means relative to the overall mean. Biased if the null hypothesis is false.

  2. Within-group variance: Based on the variability of individual scores within each group. Generally unbiased.


12. Are ANOVAs one-tailed or two-tailed tests?

ANOVAs are inherently two-tailed because they test for any difference among group means, regardless of direction. The test statistic only considers variance, not the sign of differences.


13. Explain the difference between parametric and non-parametric tests. When is it appropriate versus inappropriate to use each type of test?

  • Parametric tests: Assume normal distribution and specific conditions (e.g., homogeneity of variances, interval/ratio data). Appropriate when assumptions are met.

  • Non-parametric tests: Do not assume normality and are suitable for ordinal data or when assumptions of parametric tests are violated.

robot