Inferential Statistics Notes

Provide a comprehensive overview of Week 5 in Statistics, emphasizing the significance of inferential statistics in interpretation and decision-making.
Detailed understanding of the Chi-square test as a vital tool for analyzing categorical data and its applications in various fields.
In-depth interpretation of continuous variables with a focus on distinctions in means, highlighting practical relevance in research and clinical trials.
Clarification on T-tests, guiding how to calculate p-values effectively, crucial for hypothesis testing in research scenarios.

Data Levels:
- Nominal & Ordinal: These categories are useful for classifying data points where the order doesn't inherently matter (nominal) versus where order is significant (ordinal). These are primarily used in survey data and demographic studies.
- Interval & Ratio: These continuous data types allow for a more sophisticated analysis where not only the order is important but also the exact differences between values. This classification is essential for applications in natural sciences and quantitative research.
Statistical Techniques: The methods employed vary significantly depending on the data type, which must be matched to the correct statistical test to ensure valid conclusions.
Measuring Risk: Measuring risk is a common practice in epidemiology and social science research when analyzing associations between categorical variables.
- Commonly represented as Relative Risk (RR) or Odds Ratio (OR), these metrics offer insights into the likelihood of an event occurring in one group versus another.
- Point of no difference established at RR/OR = 1.0 indicates no increased risk associated with the exposure.
Statistical Significance: It is crucial to determine the significance of associations in research findings. Statistical methods such as p-value calculation, usually set at p < 0.05, and the examination of confidence intervals play essential roles in this evaluation.

Confidence intervals offer a range in which we can estimate the true population parameters. Understanding the limitations of sample data is critical in interpreting statistical results accurately.
- Standard Confidence Interval: The most commonly accepted level is 95%, representing a 5% risk of incorrectly rejecting the null hypothesis. Example: A CI of (0.75 – 2.11) implies that we are 95% confident the true value lies within this range.
- Precision: The width of the interval reflects the precision of the estimate; narrower intervals suggest more precise estimates, often achieved through larger sample sizes which improve the reliability of the estimates.

Interpretations based on CI:
- If the confidence interval includes 1.0, the results are not statistically significant (p > 0.05). For example, if OR/RR = 1.5 with a 95% CI of (0.83-1.16), we accept the null hypothesis.
- Conversely, if the confidence interval does not include 1.0, the results are deemed statistically significant (p < 0.05). Examples include OR/RR = 1.5 with 95% CI of (1.2-1.6) indicating a positive association or OR/RR = 0.5 with 95% CI of (0.2-0.9) indicating a protective factor.

Purpose: The Chi-Square test serves to identify whether any statistically significant differences exist between observed frequencies and expected frequencies as proposed under the null hypothesis.
Conditions: In order to apply the Chi-Square test, certain assumptions must be met:
- A minimum of 30 total observations should be present to ensure the validity of the test statistics.
- Each category within the contingency table must have a count of at least 5 to meet the expected frequency assumption.
Expected Values Calculation: The formula used to determine expected values is:
$Expected Value = \frac{(Column Total \times Row Total)}{Grand Total}$

The formula to calculate the chi-square statistic is given by:
$\chi^2 = \sum{\frac{(Observed - Expected)^2}{Expected}}$

Using observed values for alcohol consumption versus smoking data, the chi-square statistic was computed resulting in a value of $\chi^2 = 1860.4$ which indicates a significant association.

A higher chi-square value suggests a substantial likelihood of a significant difference between groups when assessed against critical values derived from chi-square distribution tables based on the degrees of freedom.

The formula for calculating degrees of freedom in a contingency table is:
df = (#row total - 1)(#column total - 1)
An example is illustrated in a 2x2 contingency table yielding (df = (2-1)(2-1) = 1), which is crucial for interpretation in statistical testing.

Purpose: T-tests are pivotal when evaluating the differences between two group means, commonly applied in clinical trials or experimental studies.
Mean Difference: The mean difference serves as a standard statistic, measuring the absolute differences attributed to intervention effects.
- If the Mean Difference equals 0, this indicates that the outcomes for the comparison groups are statistically identical, necessitating further investigation if discrepancies are observed.

ANOVA is employed for comparing means across more than two groups, providing a more comprehensive comparison than multiple t-tests.
- Methods: Utilizing t-tests for assessments between two groups and F-tests for comparisons involving three or more groups.

A specific observation of health complaints within a geriatric patient cohort revealed that the calculation of a t-test produced a t-value, p-value, which are integral in determining the validity for rejecting the null hypothesis.
Significant outcomes from these tests facilitate informed decisions regarding demographic health policies and treatment outcomes.

In analyzing continuous data, p-values and confidence intervals are essentials for discerning statistical significance. If the confidence interval encapsulates zero, this suggests non-significance (resulting in acceptance of the null hypothesis).
- On the contrary, if the confidence interval does not include zero, this serves as evidence for significance (leading to rejection of the null hypothesis).