knowt logo

Statistical Testing

A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation, or association in the variables tested

If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis

If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect

In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug

Type 1 error example:

  • A Type I error happens when you get false positive results: you conclude that the drug intervention improved symptoms when it actually didn't. These improvements could have arisen from other random factors or measurement

    errors.

Type 2 error example:

  • A Type Il error happens when you get false negative results: you conclude that the drug intervention didn't improve symptoms when it actually did. Your study may have missed key indicators of improvements or attributed any improvements to other factors instead.

Type 1 error:

  • A Type I error means rejecting the null hypothesis when it's actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.

    The risk of committing this error is the significance level (alpha or a) you choose. That's a value that you set at the beginning of your study to assess the statistical probability of obtaining your results (p value).

  • The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

  • If the p-value of your test is lower than the significance level, it means your results are statistically significant and consistent with the alternative hypothesis. If your p value is higher than the significance level, then your results are considered statistically non-significant.

Type 2 error:

  • A Type Il error means not rejecting the null hypothesis when it's actually false. This is not quite the same as "accepting" the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.

  • Instead, a Type Il error means failing to conclude there was an effect when there actually was. In reality, your study may not have had enough statistical power to detect an effect of a certain size.

  • Power is the extent to which a test can correctly detect a real effect when there is one. A power level of 80% or higher is usually considered acceptable.

    The risk of a Type Il error is inversely related to the statistical power of a study. The higher the statistical power, the lower the probability of making a Type Il error.

What is a p-value?

  • A p-value, or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test.

Does a p-value tell you whether your alternative hypothesis is true?

  • No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.

  • If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

What is statistical significance?

  • Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test.

  • Significance is usually denoted by a p-value, or probability value.

  • Statistical significance is arbitrary - it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.

  • When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.

What is a significance level?

  • The significance level, or alpha (a), is a value that the researcher sets in advance as the threshold for statistical significance. It is the maximum risk of making a false positive conclusion (Type I error) that you are willing to accept.

  • In a hypothesis test, the p value is compared to the significance level to decide whether to reject the null hypothesis.

    • If the p value is higher than the significance level, the null hypothesis is not refuted, and the results are not statistically significant.

    • If the p value is lower than the significance level, the results are interpreted as refuting the null hypothesis and reported as statistically significant.

  • Usually, the significance level is set to 0.05 or 5%. That means your results must have a 5% or lower chance of occurring under the null hypothesis to be considered statistically significant.

  • The significance level can be lowered for a more conservative test. That means an effect has to be larger to be considered statistically significant.

  • The significance level may also be set higher for significance testing in non-academic marketing or business contexts. This makes the study less rigorous and increases the probability of finding a statistically significant result.

  • As best practice, you should set a significance level before you begin your study. Otherwise, you can easily manipulate your results to match your research predictions.

  • It’s important to note that hypothesis testing can only show you whether or not to reject the null hypothesis in favour of the alternative hypothesis. It can never "prove" the null hypothesis, because the lack of a statistically significant effect doesn't mean that absolutely no effect exists.

Example of statistical decision-making?

  • Through your hypothesis test, you obtain a p value of 0.0029. Since this p value is lower than your significance level of 0.05, you consider your results statistically significant and reject the null hypothesis.

  • That means the difference in happiness levels of the different groups can be attributed to the experimental manipulation.

When reporting statistical significance, include relevant descriptive statistics about your data (e.g., means and standard deviations) as well as the test statistic and p value.

Reporting statistical significance?

  • Consistent with the alternative hypothesis, the experimental group (M = 4.67, SD = 2.14) reported significantly more happiness than the control group (M = 3.81, SD = 1.92), €(108) = 2.22, p = .0029.

Problems with relying on statistical significance:

  • There are various critiques of the concept of statistical significance and how it is used in research.

  • Researchers classify results as statistically significant or non-significant using a conventional threshold that lacks any theoretical or practical basis. This means that even a tiny 0.001 decrease in a p value can convert a research finding from statistically non-significant to significant with almost no real change in the effect.

  • On its own, statistical significance may also be misleading because it's affected by sample size.

  • In extremely large samples, you're more Ekely to obtain statistically significant results, even if the effect is actually small or negligible in the real world. This means that small effects are often exaggerated if they meet the significance threshold, while interesting results are ignored when they fall short of meeting the threshold.

  • The strong emphasis on statistical significance has led to a serious publication bias and replication crisis in the social sciences and medicine over the last few decades. Results are usually only published in academic journals if they show statistically significant results-but statistically significant results often can't be reproduced in high quality replication studies.

  • As a result, many scientists call for retiring statistical significance as a decision-making tool in favour of more nuanced approaches to interpreting results.

  • That's why APA guidelines advise reporting not only p values but also effect sizes and confidence intervals wherever possible to show the real world implications of a research outcome.

Other types of significance in research:

  • Aside from statistical significance, clinical significance and practical significance are also important research outcomes.

  • Practical significance shows you whether the research outcome is important enough to be meaningful in the real world. It is indicated by the effect size of the study.

  • Clinical significance is relevant for intervention and treatment studies. A treatment is considered clinically significant when it tangibly or substantially improves the lives of patients.

Practical significance:

  • To report practical significance, you calculate the effect size of your statistically significant finding of higher happiness ratings in the experimental group.

  • The Cohen's d is 0.266, indicating a small effect size.

Error in statistical decision-making:

  • Using hypothesis testing, you can make decisions about whether your data supports or refute your research predictions with null and alternative hypotheses.

  • Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis. It's always paired with an alternative hypothesis, which is your research prediction of an actual difference between groups or a true relationship between variables.

GG

Statistical Testing

A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation, or association in the variables tested

If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis

If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect

In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug

Type 1 error example:

  • A Type I error happens when you get false positive results: you conclude that the drug intervention improved symptoms when it actually didn't. These improvements could have arisen from other random factors or measurement

    errors.

Type 2 error example:

  • A Type Il error happens when you get false negative results: you conclude that the drug intervention didn't improve symptoms when it actually did. Your study may have missed key indicators of improvements or attributed any improvements to other factors instead.

Type 1 error:

  • A Type I error means rejecting the null hypothesis when it's actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.

    The risk of committing this error is the significance level (alpha or a) you choose. That's a value that you set at the beginning of your study to assess the statistical probability of obtaining your results (p value).

  • The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

  • If the p-value of your test is lower than the significance level, it means your results are statistically significant and consistent with the alternative hypothesis. If your p value is higher than the significance level, then your results are considered statistically non-significant.

Type 2 error:

  • A Type Il error means not rejecting the null hypothesis when it's actually false. This is not quite the same as "accepting" the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.

  • Instead, a Type Il error means failing to conclude there was an effect when there actually was. In reality, your study may not have had enough statistical power to detect an effect of a certain size.

  • Power is the extent to which a test can correctly detect a real effect when there is one. A power level of 80% or higher is usually considered acceptable.

    The risk of a Type Il error is inversely related to the statistical power of a study. The higher the statistical power, the lower the probability of making a Type Il error.

What is a p-value?

  • A p-value, or probability value, is a number describing how likely it is that your data would have occurred under the null hypothesis of your statistical test.

Does a p-value tell you whether your alternative hypothesis is true?

  • No. The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis.

  • If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.

What is statistical significance?

  • Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test.

  • Significance is usually denoted by a p-value, or probability value.

  • Statistical significance is arbitrary - it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.

  • When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.

What is a significance level?

  • The significance level, or alpha (a), is a value that the researcher sets in advance as the threshold for statistical significance. It is the maximum risk of making a false positive conclusion (Type I error) that you are willing to accept.

  • In a hypothesis test, the p value is compared to the significance level to decide whether to reject the null hypothesis.

    • If the p value is higher than the significance level, the null hypothesis is not refuted, and the results are not statistically significant.

    • If the p value is lower than the significance level, the results are interpreted as refuting the null hypothesis and reported as statistically significant.

  • Usually, the significance level is set to 0.05 or 5%. That means your results must have a 5% or lower chance of occurring under the null hypothesis to be considered statistically significant.

  • The significance level can be lowered for a more conservative test. That means an effect has to be larger to be considered statistically significant.

  • The significance level may also be set higher for significance testing in non-academic marketing or business contexts. This makes the study less rigorous and increases the probability of finding a statistically significant result.

  • As best practice, you should set a significance level before you begin your study. Otherwise, you can easily manipulate your results to match your research predictions.

  • It’s important to note that hypothesis testing can only show you whether or not to reject the null hypothesis in favour of the alternative hypothesis. It can never "prove" the null hypothesis, because the lack of a statistically significant effect doesn't mean that absolutely no effect exists.

Example of statistical decision-making?

  • Through your hypothesis test, you obtain a p value of 0.0029. Since this p value is lower than your significance level of 0.05, you consider your results statistically significant and reject the null hypothesis.

  • That means the difference in happiness levels of the different groups can be attributed to the experimental manipulation.

When reporting statistical significance, include relevant descriptive statistics about your data (e.g., means and standard deviations) as well as the test statistic and p value.

Reporting statistical significance?

  • Consistent with the alternative hypothesis, the experimental group (M = 4.67, SD = 2.14) reported significantly more happiness than the control group (M = 3.81, SD = 1.92), €(108) = 2.22, p = .0029.

Problems with relying on statistical significance:

  • There are various critiques of the concept of statistical significance and how it is used in research.

  • Researchers classify results as statistically significant or non-significant using a conventional threshold that lacks any theoretical or practical basis. This means that even a tiny 0.001 decrease in a p value can convert a research finding from statistically non-significant to significant with almost no real change in the effect.

  • On its own, statistical significance may also be misleading because it's affected by sample size.

  • In extremely large samples, you're more Ekely to obtain statistically significant results, even if the effect is actually small or negligible in the real world. This means that small effects are often exaggerated if they meet the significance threshold, while interesting results are ignored when they fall short of meeting the threshold.

  • The strong emphasis on statistical significance has led to a serious publication bias and replication crisis in the social sciences and medicine over the last few decades. Results are usually only published in academic journals if they show statistically significant results-but statistically significant results often can't be reproduced in high quality replication studies.

  • As a result, many scientists call for retiring statistical significance as a decision-making tool in favour of more nuanced approaches to interpreting results.

  • That's why APA guidelines advise reporting not only p values but also effect sizes and confidence intervals wherever possible to show the real world implications of a research outcome.

Other types of significance in research:

  • Aside from statistical significance, clinical significance and practical significance are also important research outcomes.

  • Practical significance shows you whether the research outcome is important enough to be meaningful in the real world. It is indicated by the effect size of the study.

  • Clinical significance is relevant for intervention and treatment studies. A treatment is considered clinically significant when it tangibly or substantially improves the lives of patients.

Practical significance:

  • To report practical significance, you calculate the effect size of your statistically significant finding of higher happiness ratings in the experimental group.

  • The Cohen's d is 0.266, indicating a small effect size.

Error in statistical decision-making:

  • Using hypothesis testing, you can make decisions about whether your data supports or refute your research predictions with null and alternative hypotheses.

  • Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis. It's always paired with an alternative hypothesis, which is your research prediction of an actual difference between groups or a true relationship between variables.

robot