AP Statistics Important Vocabulary Terms

AP Statistics Important Vocabulary Terms

Alternative Hypothesis

States that a treatment has had an effect or caused a change in the population.

Bias

Describes a study that systematically favors certain outcomes.

Binomial Distribution

Distribution of the probabilities of X successes out of n trials.
Denoted as B(n, p) where p is the probability of a single success.

Experiment where subjects do not know which treatment they are receiving.

Blocking

Statistical design that creates groups that are similar in some way, then randomizes treatments within each block.

Central Limit Theorem

States that when a simple random sample (SRS) is drawn from a population with mean and standard deviation 00, the sampling distribution for the sample mean will be approximately normally distributed with a mean of 0B and standard deviation of rac{00}{ ext{√}n}.

Chi-Square Distributions

Family of skewed-right distributions defined by their degrees of freedom and taking only positive values.
Shape changes with sample size.

Chi-Square Goodness-of-Fit Test

Used to determine if a population has a certain hypothesized distribution.

Chi-Square Test for Homogeneity

Determines if every category in the population has the same distribution.

Chi-Square Test for Independence

Assesses if there is a relationship between two categorical variables; also known as Chi-Square Test for Association.

Coefficient of Determination

Indicates the percentage of change in the response variable attributed to the change in the explanatory variable, symbolized as r^2.

Complement of an Event

Set of all outcomes not defined as successful outcomes for any event.

Conditional Probability

Probability of an event occurring given that another specific event has already occurred.

Confidence Interval

Interval estimate of a parameter calculated using a sample from that population.

Confidence Level

Probability that the desired parameter falls into a confidence interval if multiple intervals were calculated from samples of the same size.

Confounding Variable

A variable that could affect the result of a statistical test but has not been controlled for.

Continuous Random Variable

Takes on all values in an interval of numbers.

Control Group

Group of subjects who receive either a placebo or no treatment during an experiment.

Correlation

Measures direction and strength of a linear relationship between two quantitative variables, symbolized as r.

Critical Value

Value (z-score, t-score, or 0 value) used in hypothesis testing to determine if the null hypothesis should be rejected.

Cumulative Distribution Function

Function that calculates the sum of probabilities for each possible value of a random variable X.

Degrees of Freedom

Value used to determine significance for a t-test or a Chi-Square test, measured generally as n-1; for two-way tables, (r-1)(c-1).

Dependent Trials

Trials whose probability is affected by the outcomes of previous trials.

Dependent Variable

Also referred to as Response Variable.

Density Curve

Represents a distribution, always on or above the horizontal axis with total area of exactly 1 underneath.

Discrete Random Variable

Random variable with countable outcomes.

Disjoint Events

Events that cannot occur simultaneously; also known as Mutually Exclusive Events.

Distribution

A list of the values a variable takes on and their frequencies.

Double Blind

Experiment where neither the subjects nor the researchers know which treatment each subject receives.

Empirical Rule

Also known as the 68-95-99.7 rule, used for approximating data that falls within 1, 2, or 3 standard deviations of the mean in any normal distribution.

Expected Value

Synonymous with Mean.

Experimental Units

Individuals on which an experiment is conducted.

Explanatory Variable

Explains observed outcomes in a statistical study; also known as Independent Variable.

Exploratory Data Analysis

Uses graphs and numerical summaries to characterize variables in a data set and their relationships.

Factor

Any explanatory variable in an experiment.

Five Number Summary

Describes data using the minimum, first quartile, median, third quartile, and maximum points.

Geometric Distribution

Distribution of probabilities regarding the number of trials until the first successful outcome.

Hypothesis Test

Inference type used to determine the feasibility of an assumed population parameter; also referred to as Significance Test.

Independent Trials

Trials whose probabilities are not influenced by prior outcomes.

Independent Variable

Also referred to as Explanatory Variable.

Individuals

People or objects described in a data set.

Inference

Process of drawing conclusions about a population based on sample data.

Influential Point

Point whose removal markedly changes the regression equation.

Interquartile Range (IQR)

Difference between the third and first quartiles of a data set.

Law of Large Numbers

As the number of observations increases, the mean of those observations approaches the population mean closely.

Least Squares Regression Line

Regression line minimizing the sum of squared vertical distances of points to the line.

Level

Numerical value of a factor in an experiment.

Matched Pairs

Statistical design comparing two treatments, often with one sample receiving each treatment over different time periods.

Mean

The average of a data set; also known as Expected Value.

Median

The middle value where 50% of data is above and 50% is below.

Mutually Exclusive Events

Events that cannot occur together; synonymous with Disjoint Events.

Nonresponse

Type of bias when individuals chosen for a sample cannot be contacted or choose not to participate.

Normal Distribution

A symmetric, bell-shaped distribution where approximately 68% of data lies within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean.

Null Hypothesis

States that a treatment has no effect or that the population has not changed.

Observation

A single point from a data set.

Outlier

An individual observation that deviates significantly from the overall pattern of data, defined as any number that is 1.5 IQR outside of Q1 or Q3.

P-value

Probability that the observed outcome is as extreme or more extreme than observed if the null hypothesis is true.

Parameter

A number describing a population.

Percentile

Indicates what percent of a data set falls below a given observation.

Placebo

A treatment with no expected effect, designed to appear the same as the actual treatment.

Pooled Procedures

Combining separate samples into one sample for analysis, applicable only if variances are equal.

Population

The entire group of individuals from which information is sought.

Power of a Hypothesis Test

Probability of rejecting the null hypothesis when it is false, calculated as 1 minus the probability of a Type II error.

Probability

Proportion of times an outcome occurs over a large number of trials.

Probability Distribution Function

Assigns probabilities for each possible value of a discrete random variable X.

Proportion

Percentage of a data set that falls within a given category.

Qualitative Variable

Takes on non-numeric descriptions.

Quantitative Variable

Takes on numeric values.

Quartiles

Observations falling at the 25th, 50th, and 75th percentiles of a data set.

Range

Difference between maximum and minimum values of a data set.

Random

Individual outcomes are uncertain but follow a pattern over time.

Random Variable

Variable whose value is a numeric outcome of a random phenomenon.

Randomization

Using probability laws for sample selection and treatment assignments in experiments.

Regression Line

Describes how a response variable changes as the explanatory variable changes.

Residual

Difference between observed and predicted values of a response variable.

Response Variable

Measures the outcome of a statistical study; also known as the Dependent Variable.

Robustness

Measures how much the P-value is affected if the conditions of the hypothesis test are not met.

Sample

A part of the population used for information gathering.

Sample Space

List of all possible outcomes for a random event.

Sampling Distribution

Distribution of values taken by a statistic across all possible samples of the same size from the same population.

Sampling Frame

List from which a sample is chosen, ideally consisting of the entire population.

Significance Level

The threshold point for determining statistical significance.

Significance Test

Another term for Hypothesis Test.

Simple Random Sample (SRS)

A sample where every population member has an equal chance of being chosen.

Simulation

Method for collecting data using laws of probability to represent all possible outcomes of an experiment.

Skewed

Description of a distribution whose histogram extends farther on one side of the mean; skewed in the direction of the tail.

Standard Deviation

Square root of variance; commonly measures the spread of a data set.

Standard Error

Standard deviation of a sampling distribution, representing expected error per standard deviation from the mean.

Standard Normal Distribution

A normal distribution with a mean of 0 and standard deviation of 1.

Standardized Score

Another term for z-Score.

Statistic

A number describing a sample.

Statistically Significant

An effect unlikely to occur by chance alone.

Stratified Random Sample

Sample chosen by defining subsets within the population, then taking an SRS from each subset.

Subjects

Also referred to as Experimental Units.

Symmetric

Describing a distribution that has mirror-image histograms on either side.

t-Distributions

Family of symmetric, bell-shaped distributions with a larger standard deviation than the standard normal distribution, defined by degrees of freedom.

Treatment

A specific condition applied to an experimental unit or subject.

Treatment Group

Group of subjects receiving an actual treatment in an experiment.

Type I Error

Rejecting the null hypothesis when it is true; probability is the significance value for the test.

Type II Error

Not rejecting the null hypothesis when it is false; probability must be calculated for a specific alternative.

Unbiased Statistic

Statistic whose mean equals the population mean.

Undercoverage

Bias occurring when certain population groups are omitted from the selection process.

Variability

Describes the spread of a data set.

Variable

Any characteristic of an individual.

Variance

Average of the squares of deviations from the mean, used as a measure of spread.

Voluntary Response Sample

Sample that consists only of people who opt to participate, generally a poor method for data collection.

z-Score

Indicates how many standard deviations an observation lies above or below the mean; also called Standardized Score.