Lecture 10 - February 05
Continuous Data and Statistical Tests
Continuous data refers to quantitative data that can take an infinite number of values within a range, such as height, weight, or temperatures.
The median is a measure of central tendency utilized specifically for continuous data rather than categorical data.
Decision Criteria for Parametric vs Nonparametric Tests
Parametric Tests: Generally more powerful and require certain assumptions.
Normal Distribution: Determines if the data follows a bell-shaped distribution.
Equal Variances: Suggested to have comparable variability across groups.
Evaluating Tests for Normal Distribution
Visual Inspection: Use standard deviation and spread of data (eyeballing it).
QQ Plot (Quantile-Quantile Plot):
Compares two probability distributions visually.
Data points that lie along a linear line indicate normal distribution.
If data points curtail off either end, this signifies that data isn't normally distributed.
Shapiro-Wilk Test:
Statistical test assessing the normality of data.
Null hypothesis: Data is normally distributed.
Critical p-value: If p < 0.05, data is considered non-normally distributed.
Histogram: Bar charts for visual representation of data distribution with superimposed normal distribution for comparison.
Transforming Data for Normality
Researchers may apply transformations (log transformations or geometric means) to deal with non-normal distributions.
Commonly observed in nutritional studies like iron and B12 levels among different populations.
Criteria for Statistical Tests Selection
Type of Data: Categorized as either:
Inferential
Descriptive
Continuous (interval and ratio scales) vs Categorical (quantitative categories).
Participant Groups: Can be classified as:
Independent Participants
Paired Participants
Correlation of Measurements: Examine independence through study design.
Assumptions for Parametric Tests: Ensure data meets normality and equal variance criteria to proceed with parametric testing.
Example: Independent Samples T-Test
Compares means of two groups (e.g. vitamin D levels between students at different universities).
Null Hypothesis: No difference in means between the groups.
Types of Sample Data: Continuous.
Should be independently collected.
Statistical Output References
Example findings might display:
Mean of Mount: 52; CI: 45-58
Mean of Acadia: 75; CI: 67-83
p-value = 0.03 suggests statistical significance.
If confidence intervals do not overlap, means are statistically different.
Nonparametric Tests
Apply when parametric assumptions of normality or equal variances fail.
Mann-Whitney U Test would be utilized for independent groups with skewed data distributions.
Chi-Square Tests
Categorical data comparison; examines observed vs expected frequencies to identify differences across populations, such as nutritional status.
p-values derived from chi-square analysis determine statistical significance.
General Principles of Statistical Tests
ANOVA (Analysis of Variance): Used when comparing means across three or more groups; helps adjust the p value and maintain alpha level without increasing type I error risk due to multiple comparisons.
Follow-up Analysis with Post Hoc Tests
Post hoc tests (e.g., Tukey’s) required to pinpoint which specific groups differ following ANOVA results.
Correlation and Regression Analysis
Correlation Coefficient (r) quantifies strength of relationship between two variables from -1 (perfect negative) to +1 (perfect positive) with 0 meaning no correlation.
Regression Models:
Y = mX + b: Statistical modeling predicting outcomes based on independent variable's changes.R-squared indicates the percentage of variance explained by the independent variable in the model (ranging from 0 to 1).
Important Distinction
Correlation (r) vs R-squared: Confusion is common, understand their different ranges and what they represent.
Nonparametric correlation (Spearman’s rho) applicable when data does not meet parametric assumptions.
Practical Data Interpretation and Application
Statistical findings should always include context: p-values, confidence intervals, descriptive statistics that elucidate data significance and practical relevance.
Correlational results should clarify their curative implications and highlight clearly the limits of what they indicate (i.e., Association does not imply causation).