Lecture 10 - February 05

Continuous Data and Statistical Tests

  • Continuous data refers to quantitative data that can take an infinite number of values within a range, such as height, weight, or temperatures.

  • The median is a measure of central tendency utilized specifically for continuous data rather than categorical data.

Decision Criteria for Parametric vs Nonparametric Tests

  • Parametric Tests: Generally more powerful and require certain assumptions.

    • Normal Distribution: Determines if the data follows a bell-shaped distribution.

    • Equal Variances: Suggested to have comparable variability across groups.

Evaluating Tests for Normal Distribution

  1. Visual Inspection: Use standard deviation and spread of data (eyeballing it).

  2. QQ Plot (Quantile-Quantile Plot):

    • Compares two probability distributions visually.

    • Data points that lie along a linear line indicate normal distribution.

    • If data points curtail off either end, this signifies that data isn't normally distributed.

  3. Shapiro-Wilk Test:

    • Statistical test assessing the normality of data.

    • Null hypothesis: Data is normally distributed.

    • Critical p-value: If p < 0.05, data is considered non-normally distributed.

  4. Histogram: Bar charts for visual representation of data distribution with superimposed normal distribution for comparison.

Transforming Data for Normality

  • Researchers may apply transformations (log transformations or geometric means) to deal with non-normal distributions.

  • Commonly observed in nutritional studies like iron and B12 levels among different populations.

Criteria for Statistical Tests Selection

  • Type of Data: Categorized as either:

    • Inferential

    • Descriptive

    • Continuous (interval and ratio scales) vs Categorical (quantitative categories).

  • Participant Groups: Can be classified as:

    • Independent Participants

    • Paired Participants

  • Correlation of Measurements: Examine independence through study design.

  • Assumptions for Parametric Tests: Ensure data meets normality and equal variance criteria to proceed with parametric testing.

Example: Independent Samples T-Test

  • Compares means of two groups (e.g. vitamin D levels between students at different universities).

  • Null Hypothesis: No difference in means between the groups.

  • Types of Sample Data: Continuous.

    • Should be independently collected.

Statistical Output References
  • Example findings might display:

    • Mean of Mount: 52; CI: 45-58

    • Mean of Acadia: 75; CI: 67-83

    • p-value = 0.03 suggests statistical significance.

    • If confidence intervals do not overlap, means are statistically different.

Nonparametric Tests

  • Apply when parametric assumptions of normality or equal variances fail.

  • Mann-Whitney U Test would be utilized for independent groups with skewed data distributions.

Chi-Square Tests

  • Categorical data comparison; examines observed vs expected frequencies to identify differences across populations, such as nutritional status.

  • p-values derived from chi-square analysis determine statistical significance.

General Principles of Statistical Tests

  • ANOVA (Analysis of Variance): Used when comparing means across three or more groups; helps adjust the p value and maintain alpha level without increasing type I error risk due to multiple comparisons.

Follow-up Analysis with Post Hoc Tests
  • Post hoc tests (e.g., Tukey’s) required to pinpoint which specific groups differ following ANOVA results.

Correlation and Regression Analysis

  • Correlation Coefficient (r) quantifies strength of relationship between two variables from -1 (perfect negative) to +1 (perfect positive) with 0 meaning no correlation.

  • Regression Models:

    • Y = mX + b: Statistical modeling predicting outcomes based on independent variable's changes.

    • R-squared indicates the percentage of variance explained by the independent variable in the model (ranging from 0 to 1).

Important Distinction
  • Correlation (r) vs R-squared: Confusion is common, understand their different ranges and what they represent.

  • Nonparametric correlation (Spearman’s rho) applicable when data does not meet parametric assumptions.

Practical Data Interpretation and Application

  • Statistical findings should always include context: p-values, confidence intervals, descriptive statistics that elucidate data significance and practical relevance.

  • Correlational results should clarify their curative implications and highlight clearly the limits of what they indicate (i.e., Association does not imply causation).