States that a treatment has had an effect or caused a change in the population.
Bias
Describes a study that systematically favors certain outcomes.
Binomial Distribution
Distribution of the probabilities of X successes out of n trials.
Denoted as B(n, p) where p is the probability of a single success.
Blind
Experiment where subjects do not know which treatment they are receiving.
Blocking
Statistical design that creates groups that are similar in some way, then randomizes treatments within each block.
Central Limit Theorem
States that when a simple random sample (SRS) is drawn from a population with mean and standard deviation 00, the sampling distribution for the sample mean will be approximately normally distributed with a mean of 0B and standard deviation of rac{ 00}{ ext{√}n}.
Chi-Square Distributions
Family of skewed-right distributions defined by their degrees of freedom and taking only positive values.
Shape changes with sample size.
Chi-Square Goodness-of-Fit Test
Used to determine if a population has a certain hypothesized distribution.
Chi-Square Test for Homogeneity
Determines if every category in the population has the same distribution.
Chi-Square Test for Independence
Assesses if there is a relationship between two categorical variables; also known as Chi-Square Test for Association.
Coefficient of Determination
Indicates the percentage of change in the response variable attributed to the change in the explanatory variable, symbolized as r^2.
Complement of an Event
Set of all outcomes not defined as successful outcomes for any event.
Conditional Probability
Probability of an event occurring given that another specific event has already occurred.
Confidence Interval
Interval estimate of a parameter calculated using a sample from that population.
Confidence Level
Probability that the desired parameter falls into a confidence interval if multiple intervals were calculated from samples of the same size.
Confounding Variable
A variable that could affect the result of a statistical test but has not been controlled for.
Continuous Random Variable
Takes on all values in an interval of numbers.
Control Group
Group of subjects who receive either a placebo or no treatment during an experiment.
Correlation
Measures direction and strength of a linear relationship between two quantitative variables, symbolized as r.
Critical Value
Value (z-score, t-score, or 0 value) used in hypothesis testing to determine if the null hypothesis should be rejected.
Cumulative Distribution Function
Function that calculates the sum of probabilities for each possible value of a random variable X.
Degrees of Freedom
Value used to determine significance for a t-test or a Chi-Square test, measured generally as n-1; for two-way tables, (r-1)(c-1).
Dependent Trials
Trials whose probability is affected by the outcomes of previous trials.
Dependent Variable
Also referred to as Response Variable.
Density Curve
Represents a distribution, always on or above the horizontal axis with total area of exactly 1 underneath.
Discrete Random Variable
Random variable with countable outcomes.
Disjoint Events
Events that cannot occur simultaneously; also known as Mutually Exclusive Events.
Distribution
A list of the values a variable takes on and their frequencies.
Double Blind
Experiment where neither the subjects nor the researchers know which treatment each subject receives.
Empirical Rule
Also known as the 68-95-99.7 rule, used for approximating data that falls within 1, 2, or 3 standard deviations of the mean in any normal distribution.
Expected Value
Synonymous with Mean.
Experimental Units
Individuals on which an experiment is conducted.
Explanatory Variable
Explains observed outcomes in a statistical study; also known as Independent Variable.
Exploratory Data Analysis
Uses graphs and numerical summaries to characterize variables in a data set and their relationships.
Factor
Any explanatory variable in an experiment.
Five Number Summary
Describes data using the minimum, first quartile, median, third quartile, and maximum points.
Geometric Distribution
Distribution of probabilities regarding the number of trials until the first successful outcome.
Hypothesis Test
Inference type used to determine the feasibility of an assumed population parameter; also referred to as Significance Test.
Independent Trials
Trials whose probabilities are not influenced by prior outcomes.
Independent Variable
Also referred to as Explanatory Variable.
Individuals
People or objects described in a data set.
Inference
Process of drawing conclusions about a population based on sample data.
Influential Point
Point whose removal markedly changes the regression equation.
Interquartile Range (IQR)
Difference between the third and first quartiles of a data set.
Law of Large Numbers
As the number of observations increases, the mean of those observations approaches the population mean closely.
Least Squares Regression Line
Regression line minimizing the sum of squared vertical distances of points to the line.
Level
Numerical value of a factor in an experiment.
Matched Pairs
Statistical design comparing two treatments, often with one sample receiving each treatment over different time periods.
Mean
The average of a data set; also known as Expected Value.
Median
The middle value where 50% of data is above and 50% is below.
Mutually Exclusive Events
Events that cannot occur together; synonymous with Disjoint Events.
Nonresponse
Type of bias when individuals chosen for a sample cannot be contacted or choose not to participate.
Normal Distribution
A symmetric, bell-shaped distribution where approximately 68% of data lies within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean.
Null Hypothesis
States that a treatment has no effect or that the population has not changed.
Observation
A single point from a data set.
Outlier
An individual observation that deviates significantly from the overall pattern of data, defined as any number that is 1.5 IQR outside of Q1 or Q3.
P-value
Probability that the observed outcome is as extreme or more extreme than observed if the null hypothesis is true.
Parameter
A number describing a population.
Percentile
Indicates what percent of a data set falls below a given observation.
Placebo
A treatment with no expected effect, designed to appear the same as the actual treatment.
Pooled Procedures
Combining separate samples into one sample for analysis, applicable only if variances are equal.
Population
The entire group of individuals from which information is sought.
Power of a Hypothesis Test
Probability of rejecting the null hypothesis when it is false, calculated as 1 minus the probability of a Type II error.
Probability
Proportion of times an outcome occurs over a large number of trials.
Probability Distribution Function
Assigns probabilities for each possible value of a discrete random variable X.
Proportion
Percentage of a data set that falls within a given category.
Qualitative Variable
Takes on non-numeric descriptions.
Quantitative Variable
Takes on numeric values.
Quartiles
Observations falling at the 25th, 50th, and 75th percentiles of a data set.
Range
Difference between maximum and minimum values of a data set.
Random
Individual outcomes are uncertain but follow a pattern over time.
Random Variable
Variable whose value is a numeric outcome of a random phenomenon.
Randomization
Using probability laws for sample selection and treatment assignments in experiments.
Regression Line
Describes how a response variable changes as the explanatory variable changes.
Residual
Difference between observed and predicted values of a response variable.
Response Variable
Measures the outcome of a statistical study; also known as the Dependent Variable.
Robustness
Measures how much the P-value is affected if the conditions of the hypothesis test are not met.
Sample
A part of the population used for information gathering.
Sample Space
List of all possible outcomes for a random event.
Sampling Distribution
Distribution of values taken by a statistic across all possible samples of the same size from the same population.
Sampling Frame
List from which a sample is chosen, ideally consisting of the entire population.
Significance Level
The threshold point for determining statistical significance.
Significance Test
Another term for Hypothesis Test.
Simple Random Sample (SRS)
A sample where every population member has an equal chance of being chosen.
Simulation
Method for collecting data using laws of probability to represent all possible outcomes of an experiment.
Skewed
Description of a distribution whose histogram extends farther on one side of the mean; skewed in the direction of the tail.
Standard Deviation
Square root of variance; commonly measures the spread of a data set.
Standard Error
Standard deviation of a sampling distribution, representing expected error per standard deviation from the mean.
Standard Normal Distribution
A normal distribution with a mean of 0 and standard deviation of 1.
Standardized Score
Another term for z-Score.
Statistic
A number describing a sample.
Statistically Significant
An effect unlikely to occur by chance alone.
Stratified Random Sample
Sample chosen by defining subsets within the population, then taking an SRS from each subset.
Subjects
Also referred to as Experimental Units.
Symmetric
Describing a distribution that has mirror-image histograms on either side.
t-Distributions
Family of symmetric, bell-shaped distributions with a larger standard deviation than the standard normal distribution, defined by degrees of freedom.
Treatment
A specific condition applied to an experimental unit or subject.
Treatment Group
Group of subjects receiving an actual treatment in an experiment.
Type I Error
Rejecting the null hypothesis when it is true; probability is the significance value for the test.
Type II Error
Not rejecting the null hypothesis when it is false; probability must be calculated for a specific alternative.
Unbiased Statistic
Statistic whose mean equals the population mean.
Undercoverage
Bias occurring when certain population groups are omitted from the selection process.
Variability
Describes the spread of a data set.
Variable
Any characteristic of an individual.
Variance
Average of the squares of deviations from the mean, used as a measure of spread.
Voluntary Response Sample
Sample that consists only of people who opt to participate, generally a poor method for data collection.
z-Score
Indicates how many standard deviations an observation lies above or below the mean; also called Standardized Score.