I PRAY to STATS – Core Vocabulary

0.0(0)

Studied by 0 people

Knowt Live

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/100

Earn XP

Description and Tags

Key statistical vocabulary extracted from the 'I PRAY to STATS' lecture notes, covering basic descriptive statistics, sampling methods, experimental design, common fallacies, graphical displays, regression, probability distributions, confidence intervals, hypothesis testing, and useful combinatorial principles.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

101 Terms

New cards

Mean (Arithmetic)

Sum of all measurements divided by the number of observations; symbol μ for a population and x̄ for a sample.

New cards

Median

The middle value that splits an ordered dataset into two equal halves.

New cards

Mode

The most frequently occurring value in a dataset.

New cards

Interquartile Range (IQR)

Difference between the third quartile (Q3) and the first quartile (Q1); measures spread of the middle 50 %. IQR = 3 - Q1

New cards

Range

Largest value minus smallest value in a dataset; a crude measure of variability.

New cards

Standard Deviation (σ or s)

A measure of the amount of variation a variable has about its mean. σ for a population. s for a sample. For a sample standard deviation, you must divide by n -1.

<p>A measure of the amount of variation a variable has about its mean. σ for a population. s for a sample. For a sample standard deviation, you must divide by n -1.</p>

New cards

Variance

Square of the standard deviation; additive for independent variables.

New cards

Outlier

A data point that differs markedly from the overall pattern of the data.

New cards

IQR Rule

A point is an outlier if it is < Q1 – 1.5·IQR or > Q3 + 1.5·IQR.

New cards

Robust Estimator

Statistic that is little-affected by outliers (e.g., median, IQR).

New cards

Unbiased Estimator

Statistic whose expected value equals the true population parameter (e.g., sample mean for μ).

New cards

Sampling

Selecting a subset of individuals from a population to estimate characteristics of the whole.

New cards

Simple Random Sample (SRS)

Every subset of the population has an equal chance of being selected.

New cards

Stratified Sampling

Population divided into homogeneous strata (similar characteristics), and an is SRS taken within each stratum.

New cards

Cluster Sampling

Population split into clusters (naturally occurring groups); some clusters randomly chosen and all members studied.

New cards

Systematic Sampling

Selecting every k-th element after a random start in an ordered list.

New cards

Multistage Sampling

Successive sampling of clusters within clusters until reaching the final sampling units.

New cards

Probability-Proportional-to-Size Sampling

Selection probability of each element is proportional to how big its subgroup is

New cards

Line-Intercept Sampling

Elements included if intersected by a pre-chosen line segment (transect). Used to measure any features that intersect a line, usually used in vegetation

New cards

Panel Sampling

Same sampled individuals are surveyed repeatedly over time.

New cards

Nonprobability Sampling

Sampling procedure where some population members have unknown or zero chance of selection.

New cards

Bias (Statistical)

Systematic tendency that skews results away from the true value.

New cards

Sampling Bias

Sample not representative because selection probabilities differ across population members.

New cards

Non-response Bias

When people respond to a poll or a survey differ meaningfully from non-respondents, distorting results. Also known as participation bias

New cards

Undercoverage Bias

Part of the population is systematically excluded from the sampling frame.

New cards

Self-selection Bias

Individuals decide themselves to participate, often those with strong opinions.

New cards

Convenience Sampling

Sample drawn from units that are easiest to access.

New cards

Voluntary Response Sampling

Participants volunteer to respond; prone to extreme opinions.

New cards

Quota Sampling

Like SRS but does NOT randomly select members to fill each quota. Instead, researchers fill quotas based on specific characteristics until they reach a predetermined number for each category, which can lead to bias.

New cards

Snowball Sampling

Existing participants recruit future participants, building a sample via referrals. This can cause bias through participants being more likely to refer people with similar opinions.

New cards

Experimental Factor

Variable that is deliberately manipulated by the researcher in an experiment.

New cards

Treatment

Specific combination of conditions applied to experimental units.

New cards

Block

A group of similar experimental units or observations that are grouped together to reduce variability in an experiment or analysis.

New cards

Experimental Unit

Smallest entity to which a treatment is independently applied. The thing being experimented on

New cards

Level (of a factor)

Specific setting, value, category, or just the name of an experimental factor (independent variable) that is being tested.

New cards

Simpson’s Paradox

Trend that appears in several groups that dissapears or reverses when groups are combined.

New cards

Gambler’s Fallacy

Belief that deviations from expected behavior must be corrected in the short run.

New cards

Hot-Hand Fallacy

Assuming a run of successes makes further success more likely. Also known as the Monte-Carlo fallacy.

New cards

Base-Rate Fallacy

Ignoring relevant statistical information (base rates) when evaluating specific evidence.

New cards

Will Rogers Phenomenon

Moving an observation from one group to another increases both groups’ averages. Also known as the Okie Paradox.

New cards

Berkson’s Paradox

A result that makes it seem that two unrelated variables appear to be correlated (usually negatively).

New cards

Nominal Data

Categorical data with no intrinsic ordering (e.g., eye color).

New cards

Ordinal Data

Categorical data with a meaningful order but unequal intervals (e.g., class rank).

New cards

Interval Data

Numeric data with equal intervals but no true zero (e.g., temperature °C).

New cards

Ratio Data

Numeric data with equal intervals and a true zero (e.g., weight).

New cards

Pie Chart

Circular chart where slice areas show category proportions.

New cards

Bar Chart

Rectangular bars represent categorical frequencies or values.

New cards

Mosaic Plot

Tile plot showing joint distribution of two (or more) categorical variables.

New cards

Scatter Plot

Graph of paired quantitative data; used to study relationships.

New cards

Histogram

Bar graph of binned numerical data frequencies.

New cards

Box Plot

Displays median, quartiles and potential outliers of numerical data.

New cards

Dot Plot

Dots along a number line show individual data points; good for small n.

New cards

Stem-and-Leaf Plot

Splits numbers into stems and leaves to display shape and raw data.

New cards

Line Graph

Connects data points with lines to show trends over time or sequence.

New cards

Normal Q–Q Plot

Plots data quantiles against theoretical normal quantiles to assess normality.

New cards

Residual Plot

Graph of residuals versus predicted values; checks model fit assumptions.

New cards

Ogive

Cumulative frequency curve of numerical data.

New cards

Least-Squares Regression Line (LSRL)

Line that minimizes the sum of squared vertical residuals.

New cards

Pearson Correlation Coefficient (r)

Measures strength and direction of linear relationship (−1 to +1).

New cards

Coefficient of Determination (r²)

Proportion of variance in y explained by x via the model.

New cards

Covariance

Average product of deviations of two variables; sign indicates relationship direction.

New cards

Residual (e)

Observed value minus predicted value (y – ŷ).

New cards

High Leverage Point

Observation with an extreme x-value relative to others.

New cards

Influential Point

Observation that markedly changes regression slope or intercept if removed.

New cards

Extrapolation

Predicting beyond the range of observed x; often unreliable.

New cards

Interpolation

Predicting within the range of observed x; usually reliable.

New cards

Homoscedasticity

Residual variance is constant across levels of the predictor.

New cards

Heteroscedasticity

Residual variance changes with the predictor; fan-shape pattern.

New cards

Sum of Squares Total (SST)

Total variability in y: Σ(yi – ȳ)².

New cards

Sum of Squares Regression (SSR)

Explained variability: Σ(ŷi – ȳ)².

New cards

Sum of Squares Error (SSE)

Unexplained variability: Σ(yi – ŷi)².

New cards

Probability Distribution

Function that assigns probabilities to all possible outcomes of a random variable.

New cards

Normal Distribution

Bell-shaped, symmetric continuous distribution described by μ and σ.

New cards

Empirical Rule (68-95-99.7)

For normal data, ~68 % within 1 σ, 95 % within 2 σ, 99.7 % within 3 σ.

New cards

z-Score

Standardized value: (x – μ)/σ; counts SDs from the mean.

New cards

Percentile

Value below which a specified percentage of observations fall.

New cards

Student’s t-Distribution

Symmetric distribution with heavier tails; used when σ unknown and n small (df = n–1).

New cards

t-Score

Standardized statistic using sample s and t-distribution.

New cards

Sampling Distribution

Probability distribution of a statistic over all possible samples of a fixed size.

New cards

Central Limit Theorem (CLT)

Sampling distribution of the mean approaches normal as n increases, regardless of population shape.

New cards

Law of Large Numbers

Sample mean converges to the population mean as sample size grows.

New cards

Uniform Distribution

All outcomes in an interval are equally likely.

New cards

Binomial Distribution

Counts number of successes in n independent Bernoulli trials with probability p.

New cards

Geometric Distribution

Counts trials needed to get the first success in repeated Bernoulli trials.

New cards

Chi-Square Distribution

Distribution of the sum of squared standard normals; parameter df (k).

New cards

Negative Binomial Distribution

Number of failures before r successes occur in Bernoulli trials.

New cards

Hypergeometric Distribution

Success count in draws without replacement from a finite population.

New cards

Poisson Distribution

Models number of events in a fixed interval given constant mean rate λ.

New cards

Confidence Interval

Interval estimate that likely contains the population parameter at a stated confidence level.

New cards

Confidence Level

Long-run proportion of CIs that capture the true parameter (e.g., 95 %).

New cards

Critical Value (z* or t*)

Cutoff on the reference distribution that matches the desired confidence level.

New cards

Margin of Error

Half-width of a confidence interval; (critical value) × (standard error).

New cards

Null Hypothesis (H₀)

Default claim that there is no effect or difference.

New cards

Alternative Hypothesis (Hₐ)

Claim of an effect or difference that we seek evidence for.

New cards

p-Value

Probability of observing a result at least as extreme as the sample, assuming H₀ is true.

New cards

Significance Level (α)

Threshold probability for rejecting H₀ (commonly 0.05).

New cards

Type I Error

Rejecting a true null hypothesis; false positive; probability = α.

New cards

Type II Error

Failing to reject a false null hypothesis; false negative.

New cards

Power (1 – β)

Probability of correctly rejecting a false null hypothesis.

100

New cards

Bayes’ Theorem

P(A | B) = P(B | A) · P(A) / P(B); updates probabilities with new evidence.