Looks like no one added any tags here yet for you.
Statistics
The science of collecting, organizing, and interpreting data.
Individuals
The objects on which data are collected (e.g., students, states, hospitals).
Variables
Characteristics recorded about individuals.
Quantitative Variables
Numeric values with meaningful operations (e.g., height, weight).
Categorical Variables
Groups or categories (e.g., gender, college type).
Identifier Variables
Unique values assigned to individuals (e.g., ID numbers).
Bar Charts & Pie Charts
Graphs that represent categorical data.
Histograms
Graphs that display quantitative data distributions.
Boxplots
Graphs that compare distributions and identify outliers.
Dotplots & Density Plots
Graphs that represent distributions and trends.
Mean (x̄)
Sum of all values divided by the number of values.
Median (m)
The middle value when data is ordered.
Range
Difference between the largest and smallest values.
Interquartile Range (IQR)
Difference between Q3 (75th percentile) and Q1 (25th percentile).
Standard Deviation (S)
Measures variation around the mean.
Z-Score
Measure of how far a value is from the mean in standard deviations.
68-95-99.7 Rule
Describes Normal Distribution percentages.
Explanatory Variable
The variable suspected to influence another.
Response Variable
The variable that is measured as an outcome.
Simpson’s Paradox
When a relationship between two variables reverses due to a lurking variable.
Proportion
A fraction representing part of a whole (e.g., 0.25 or 1/4).
Percent
A proportion multiplied by 100 (e.g., 0.25 = 25%).
Nominal Variables
Categories without a meaningful order (e.g., colors, names).
Ordinal Variables
Categories with a meaningful order but no consistent difference (e.g., ranking).
Natural Variables
Ordered with meaningful differences (e.g., temperature, income).
Standardizing (Z-score)
Z=X−μσ (Tells how many SDs a value is from the mean).
Shifting
Adding/subtracting a constant affects mean but not spread.
Scaling
Multiplying/dividing a constant affects both center and spread.
Correlation (R-Value)
Measures the strength of a linear relationship between two quantitative variables.
Bimodal Distribution
A distribution with two peaks.
Skewed Distribution
Distribution where one tail is longer than the other.
Symmetric Distribution
A distribution where the left and right sides are mirror images.
Five-Number Summary
Min, Q1, Median, Q3, Max that summarize a dataset.
Outliers
Data points significantly higher or lower than the rest.
Frequency Table
Table listing the number of times categories occur.
Contingency Table
Table showing frequency distribution of variables to examine relationships.
favstats()
An R function providing summary statistics.
histogram()
An R function that generates a histogram to visualize distributions.
tally()
An R function creating frequency tables for categorical data.
bwplot()
An R function generating boxplots to compare distributions.