Numerical (Quantitative)
Continuous: Can take any value within a range (e.g., height, weight).
Discrete: Can only take distinct values (e.g., number of students in a class).
Categorical
Regular: No inherent order (e.g., colors).
Ordinal: Meaningful order but no consistent difference between categories (e.g., Likert scale responses).
Associated (Dependent) Variables: Show some connection.
Independent Variables: No connection between variables.
Scatterplots & Correlations: Used to analyze positive, negative, or no associations between variables.
Population: Entire group of interest (described by parameters).
Sample: Subset of the population used for the study (described by statistics).
Non-response Bias: When only a small fraction of sampled individuals respond.
Voluntary Response Bias: When individuals self-select, often those with strong opinions.
Convenience Sampling: Sampling individuals that are easily accessible.
Simple Random Sample: Every individual has an equal chance of selection.
Stratified Sampling: Population divided into subgroups (strata) with random samples from each.
Cluster Sampling: Randomly selecting entire groups (clusters) rather than individuals.
Multistage Sampling: Sampling in stages using different methods.
Control: Compare treatment and control groups.
Randomize: Randomly assign subjects to treatments.
Replicate: Ensure enough samples or replicate the study.
Block: Group subjects based on known variables that affect the response.
Placebo Effect: Psychological response to a non-active treatment.
Blinding: Prevents bias by concealing treatment assignment.
Institutional Review Boards (IRB): Protects study subjects.
Informed Consent: Participants must be fully informed and provide consent.
Confidentiality vs. Anonymity:
Confidentiality: Identities are separated from the data.
Anonymity: Identities are not collected.
Distribution of a Variable: Describes values a variable takes and their frequency.
Graphical Tools:
Categorical Variables: Pie charts, bar graphs.
Quantitative Variables: Histograms, stemplots, boxplots.
Frequency Distribution Table: Shows counts & percentages.
Bar Plot: Represents frequencies or proportions; can be stacked/side-by-side.
Contingency Table: Summarizes data for two categorical variables.
Measures of Center:
Mean: Average of the dataset.
Median: The middle value of ranked data.
Measures of Spread:
Variance: Average squared deviation from the mean.
Standard Deviation: Square root of variance.
IQR (Interquartile Range): Q3 - Q1.
Stemplots & Histograms:
Show data distribution and density.
Shape Descriptions: Unimodal, bimodal, symmetric, skewed.
Bin Width: Affects visualization of data distribution.
Box Plots (Box-and-Whisker Plots):
Visualizes quartiles, IQR, and outliers.
Useful for comparing multiple groups.
Robust Statistics:
For skewed distributions: Median & IQR are better measures than mean & standard deviation.
Scatterplots: Visualize relationships between two numerical variables.
Correlation Coefficient (r):
Measures strength & direction of linear relationships (-1 to +1).
Least Squares Regression Line:
Minimizes squared residuals; used for prediction.
Residuals:
Difference between observed & predicted values.
Residual Plots: Assess model fit.
R² (Coefficient of Determination):
Proportion of variability explained by the model.
Slope & Intercept:
Slope: Expected change in response variable per unit change in explanatory variable.
Intercept: Expected value when explanatory variable = 0.
Prediction & Extrapolation:
Predicting within data range is reliable.
Extrapolation (beyond the data range) is often unreliable.