KB

AP Statistics Important Vocabulary Terms

AP Statistics Important Vocabulary Terms

Alternative Hypothesis

  • States that a treatment has had an effect or caused a change in the population.

Bias

  • Describes a study that systematically favors certain outcomes.

Binomial Distribution

  • Distribution of the probabilities of X successes out of n trials.
  • Denoted as B(n, p) where p is the probability of a single success.

Blind

  • Experiment where subjects do not know which treatment they are receiving.

Blocking

  • Statistical design that creates groups that are similar in some way, then randomizes treatments within each block.

Central Limit Theorem

  • States that when a simple random sample (SRS) is drawn from a population with mean and standard deviation 00, the sampling distribution for the sample mean will be approximately normally distributed with a mean of 0B and standard deviation of rac{00}{ ext{√}n}.

Chi-Square Distributions

  • Family of skewed-right distributions defined by their degrees of freedom and taking only positive values.
  • Shape changes with sample size.

Chi-Square Goodness-of-Fit Test

  • Used to determine if a population has a certain hypothesized distribution.

Chi-Square Test for Homogeneity

  • Determines if every category in the population has the same distribution.

Chi-Square Test for Independence

  • Assesses if there is a relationship between two categorical variables; also known as Chi-Square Test for Association.

Coefficient of Determination

  • Indicates the percentage of change in the response variable attributed to the change in the explanatory variable, symbolized as r^2.

Complement of an Event

  • Set of all outcomes not defined as successful outcomes for any event.

Conditional Probability

  • Probability of an event occurring given that another specific event has already occurred.

Confidence Interval

  • Interval estimate of a parameter calculated using a sample from that population.

Confidence Level

  • Probability that the desired parameter falls into a confidence interval if multiple intervals were calculated from samples of the same size.

Confounding Variable

  • A variable that could affect the result of a statistical test but has not been controlled for.

Continuous Random Variable

  • Takes on all values in an interval of numbers.

Control Group

  • Group of subjects who receive either a placebo or no treatment during an experiment.

Correlation

  • Measures direction and strength of a linear relationship between two quantitative variables, symbolized as r.

Critical Value

  • Value (z-score, t-score, or 0 value) used in hypothesis testing to determine if the null hypothesis should be rejected.

Cumulative Distribution Function

  • Function that calculates the sum of probabilities for each possible value of a random variable X.

Degrees of Freedom

  • Value used to determine significance for a t-test or a Chi-Square test, measured generally as n-1; for two-way tables, (r-1)(c-1).

Dependent Trials

  • Trials whose probability is affected by the outcomes of previous trials.

Dependent Variable

  • Also referred to as Response Variable.

Density Curve

  • Represents a distribution, always on or above the horizontal axis with total area of exactly 1 underneath.

Discrete Random Variable

  • Random variable with countable outcomes.

Disjoint Events

  • Events that cannot occur simultaneously; also known as Mutually Exclusive Events.

Distribution

  • A list of the values a variable takes on and their frequencies.

Double Blind

  • Experiment where neither the subjects nor the researchers know which treatment each subject receives.

Empirical Rule

  • Also known as the 68-95-99.7 rule, used for approximating data that falls within 1, 2, or 3 standard deviations of the mean in any normal distribution.

Expected Value

  • Synonymous with Mean.

Experimental Units

  • Individuals on which an experiment is conducted.

Explanatory Variable

  • Explains observed outcomes in a statistical study; also known as Independent Variable.

Exploratory Data Analysis

  • Uses graphs and numerical summaries to characterize variables in a data set and their relationships.

Factor

  • Any explanatory variable in an experiment.

Five Number Summary

  • Describes data using the minimum, first quartile, median, third quartile, and maximum points.

Geometric Distribution

  • Distribution of probabilities regarding the number of trials until the first successful outcome.

Hypothesis Test

  • Inference type used to determine the feasibility of an assumed population parameter; also referred to as Significance Test.

Independent Trials

  • Trials whose probabilities are not influenced by prior outcomes.

Independent Variable

  • Also referred to as Explanatory Variable.

Individuals

  • People or objects described in a data set.

Inference

  • Process of drawing conclusions about a population based on sample data.

Influential Point

  • Point whose removal markedly changes the regression equation.

Interquartile Range (IQR)

  • Difference between the third and first quartiles of a data set.

Law of Large Numbers

  • As the number of observations increases, the mean of those observations approaches the population mean closely.

Least Squares Regression Line

  • Regression line minimizing the sum of squared vertical distances of points to the line.

Level

  • Numerical value of a factor in an experiment.

Matched Pairs

  • Statistical design comparing two treatments, often with one sample receiving each treatment over different time periods.

Mean

  • The average of a data set; also known as Expected Value.

Median

  • The middle value where 50% of data is above and 50% is below.

Mutually Exclusive Events

  • Events that cannot occur together; synonymous with Disjoint Events.

Nonresponse

  • Type of bias when individuals chosen for a sample cannot be contacted or choose not to participate.

Normal Distribution

  • A symmetric, bell-shaped distribution where approximately 68% of data lies within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean.

Null Hypothesis

  • States that a treatment has no effect or that the population has not changed.

Observation

  • A single point from a data set.

Outlier

  • An individual observation that deviates significantly from the overall pattern of data, defined as any number that is 1.5 IQR outside of Q1 or Q3.

P-value

  • Probability that the observed outcome is as extreme or more extreme than observed if the null hypothesis is true.

Parameter

  • A number describing a population.

Percentile

  • Indicates what percent of a data set falls below a given observation.

Placebo

  • A treatment with no expected effect, designed to appear the same as the actual treatment.

Pooled Procedures

  • Combining separate samples into one sample for analysis, applicable only if variances are equal.

Population

  • The entire group of individuals from which information is sought.

Power of a Hypothesis Test

  • Probability of rejecting the null hypothesis when it is false, calculated as 1 minus the probability of a Type II error.

Probability

  • Proportion of times an outcome occurs over a large number of trials.

Probability Distribution Function

  • Assigns probabilities for each possible value of a discrete random variable X.

Proportion

  • Percentage of a data set that falls within a given category.

Qualitative Variable

  • Takes on non-numeric descriptions.

Quantitative Variable

  • Takes on numeric values.

Quartiles

  • Observations falling at the 25th, 50th, and 75th percentiles of a data set.

Range

  • Difference between maximum and minimum values of a data set.

Random

  • Individual outcomes are uncertain but follow a pattern over time.

Random Variable

  • Variable whose value is a numeric outcome of a random phenomenon.

Randomization

  • Using probability laws for sample selection and treatment assignments in experiments.

Regression Line

  • Describes how a response variable changes as the explanatory variable changes.

Residual

  • Difference between observed and predicted values of a response variable.

Response Variable

  • Measures the outcome of a statistical study; also known as the Dependent Variable.

Robustness

  • Measures how much the P-value is affected if the conditions of the hypothesis test are not met.

Sample

  • A part of the population used for information gathering.

Sample Space

  • List of all possible outcomes for a random event.

Sampling Distribution

  • Distribution of values taken by a statistic across all possible samples of the same size from the same population.

Sampling Frame

  • List from which a sample is chosen, ideally consisting of the entire population.

Significance Level

  • The threshold point for determining statistical significance.

Significance Test

  • Another term for Hypothesis Test.

Simple Random Sample (SRS)

  • A sample where every population member has an equal chance of being chosen.

Simulation

  • Method for collecting data using laws of probability to represent all possible outcomes of an experiment.

Skewed

  • Description of a distribution whose histogram extends farther on one side of the mean; skewed in the direction of the tail.

Standard Deviation

  • Square root of variance; commonly measures the spread of a data set.

Standard Error

  • Standard deviation of a sampling distribution, representing expected error per standard deviation from the mean.

Standard Normal Distribution

  • A normal distribution with a mean of 0 and standard deviation of 1.

Standardized Score

  • Another term for z-Score.

Statistic

  • A number describing a sample.

Statistically Significant

  • An effect unlikely to occur by chance alone.

Stratified Random Sample

  • Sample chosen by defining subsets within the population, then taking an SRS from each subset.

Subjects

  • Also referred to as Experimental Units.

Symmetric

  • Describing a distribution that has mirror-image histograms on either side.

t-Distributions

  • Family of symmetric, bell-shaped distributions with a larger standard deviation than the standard normal distribution, defined by degrees of freedom.

Treatment

  • A specific condition applied to an experimental unit or subject.

Treatment Group

  • Group of subjects receiving an actual treatment in an experiment.

Type I Error

  • Rejecting the null hypothesis when it is true; probability is the significance value for the test.

Type II Error

  • Not rejecting the null hypothesis when it is false; probability must be calculated for a specific alternative.

Unbiased Statistic

  • Statistic whose mean equals the population mean.

Undercoverage

  • Bias occurring when certain population groups are omitted from the selection process.

Variability

  • Describes the spread of a data set.

Variable

  • Any characteristic of an individual.

Variance

  • Average of the squares of deviations from the mean, used as a measure of spread.

Voluntary Response Sample

  • Sample that consists only of people who opt to participate, generally a poor method for data collection.

z-Score

  • Indicates how many standard deviations an observation lies above or below the mean; also called Standardized Score.