AP Statistics Important Vocabulary Terms
AP Statistics Important Vocabulary Terms
Alternative Hypothesis
- States that a treatment has had an effect or caused a change in the population.
Bias
- Describes a study that systematically favors certain outcomes.
Binomial Distribution
- Distribution of the probabilities of X successes out of n trials.
- Denoted as B(n, p) where p is the probability of a single success.
Blind
- Experiment where subjects do not know which treatment they are receiving.
Blocking
- Statistical design that creates groups that are similar in some way, then randomizes treatments within each block.
Central Limit Theorem
- States that when a simple random sample (SRS) is drawn from a population with mean and standard deviation 00, the sampling distribution for the sample mean will be approximately normally distributed with a mean of 0B and standard deviation of rac{ 00}{ ext{√}n}.
Chi-Square Distributions
- Family of skewed-right distributions defined by their degrees of freedom and taking only positive values.
- Shape changes with sample size.
Chi-Square Goodness-of-Fit Test
- Used to determine if a population has a certain hypothesized distribution.
Chi-Square Test for Homogeneity
- Determines if every category in the population has the same distribution.
Chi-Square Test for Independence
- Assesses if there is a relationship between two categorical variables; also known as Chi-Square Test for Association.
Coefficient of Determination
- Indicates the percentage of change in the response variable attributed to the change in the explanatory variable, symbolized as r^2.
Complement of an Event
- Set of all outcomes not defined as successful outcomes for any event.
Conditional Probability
- Probability of an event occurring given that another specific event has already occurred.
Confidence Interval
- Interval estimate of a parameter calculated using a sample from that population.
Confidence Level
- Probability that the desired parameter falls into a confidence interval if multiple intervals were calculated from samples of the same size.
Confounding Variable
- A variable that could affect the result of a statistical test but has not been controlled for.
Continuous Random Variable
- Takes on all values in an interval of numbers.
Control Group
- Group of subjects who receive either a placebo or no treatment during an experiment.
Correlation
- Measures direction and strength of a linear relationship between two quantitative variables, symbolized as r.
Critical Value
- Value (z-score, t-score, or 0 value) used in hypothesis testing to determine if the null hypothesis should be rejected.
Cumulative Distribution Function
- Function that calculates the sum of probabilities for each possible value of a random variable X.
Degrees of Freedom
- Value used to determine significance for a t-test or a Chi-Square test, measured generally as n-1; for two-way tables, (r-1)(c-1).
Dependent Trials
- Trials whose probability is affected by the outcomes of previous trials.
Dependent Variable
- Also referred to as Response Variable.
Density Curve
- Represents a distribution, always on or above the horizontal axis with total area of exactly 1 underneath.
Discrete Random Variable
- Random variable with countable outcomes.
Disjoint Events
- Events that cannot occur simultaneously; also known as Mutually Exclusive Events.
Distribution
- A list of the values a variable takes on and their frequencies.
Double Blind
- Experiment where neither the subjects nor the researchers know which treatment each subject receives.
Empirical Rule
- Also known as the 68-95-99.7 rule, used for approximating data that falls within 1, 2, or 3 standard deviations of the mean in any normal distribution.
Expected Value
Experimental Units
- Individuals on which an experiment is conducted.
Explanatory Variable
- Explains observed outcomes in a statistical study; also known as Independent Variable.
Exploratory Data Analysis
- Uses graphs and numerical summaries to characterize variables in a data set and their relationships.
Factor
- Any explanatory variable in an experiment.
Five Number Summary
- Describes data using the minimum, first quartile, median, third quartile, and maximum points.
Geometric Distribution
- Distribution of probabilities regarding the number of trials until the first successful outcome.
Hypothesis Test
- Inference type used to determine the feasibility of an assumed population parameter; also referred to as Significance Test.
Independent Trials
- Trials whose probabilities are not influenced by prior outcomes.
Independent Variable
- Also referred to as Explanatory Variable.
Individuals
- People or objects described in a data set.
Inference
- Process of drawing conclusions about a population based on sample data.
Influential Point
- Point whose removal markedly changes the regression equation.
Interquartile Range (IQR)
- Difference between the third and first quartiles of a data set.
Law of Large Numbers
- As the number of observations increases, the mean of those observations approaches the population mean closely.
Least Squares Regression Line
- Regression line minimizing the sum of squared vertical distances of points to the line.
Level
- Numerical value of a factor in an experiment.
Matched Pairs
- Statistical design comparing two treatments, often with one sample receiving each treatment over different time periods.
Mean
- The average of a data set; also known as Expected Value.
- The middle value where 50% of data is above and 50% is below.
Mutually Exclusive Events
- Events that cannot occur together; synonymous with Disjoint Events.
Nonresponse
- Type of bias when individuals chosen for a sample cannot be contacted or choose not to participate.
Normal Distribution
- A symmetric, bell-shaped distribution where approximately 68% of data lies within one standard deviation, 95% within two, and 99.7% within three standard deviations of the mean.
Null Hypothesis
- States that a treatment has no effect or that the population has not changed.
Observation
- A single point from a data set.
Outlier
- An individual observation that deviates significantly from the overall pattern of data, defined as any number that is 1.5 IQR outside of Q1 or Q3.
P-value
- Probability that the observed outcome is as extreme or more extreme than observed if the null hypothesis is true.
Parameter
- A number describing a population.
Percentile
- Indicates what percent of a data set falls below a given observation.
Placebo
- A treatment with no expected effect, designed to appear the same as the actual treatment.
Pooled Procedures
- Combining separate samples into one sample for analysis, applicable only if variances are equal.
Population
- The entire group of individuals from which information is sought.
Power of a Hypothesis Test
- Probability of rejecting the null hypothesis when it is false, calculated as 1 minus the probability of a Type II error.
Probability
- Proportion of times an outcome occurs over a large number of trials.
Probability Distribution Function
- Assigns probabilities for each possible value of a discrete random variable X.
Proportion
- Percentage of a data set that falls within a given category.
Qualitative Variable
- Takes on non-numeric descriptions.
Quantitative Variable
Quartiles
- Observations falling at the 25th, 50th, and 75th percentiles of a data set.
Range
- Difference between maximum and minimum values of a data set.
Random
- Individual outcomes are uncertain but follow a pattern over time.
Random Variable
- Variable whose value is a numeric outcome of a random phenomenon.
Randomization
- Using probability laws for sample selection and treatment assignments in experiments.
Regression Line
- Describes how a response variable changes as the explanatory variable changes.
Residual
- Difference between observed and predicted values of a response variable.
Response Variable
- Measures the outcome of a statistical study; also known as the Dependent Variable.
Robustness
- Measures how much the P-value is affected if the conditions of the hypothesis test are not met.
Sample
- A part of the population used for information gathering.
Sample Space
- List of all possible outcomes for a random event.
Sampling Distribution
- Distribution of values taken by a statistic across all possible samples of the same size from the same population.
Sampling Frame
- List from which a sample is chosen, ideally consisting of the entire population.
Significance Level
- The threshold point for determining statistical significance.
Significance Test
- Another term for Hypothesis Test.
Simple Random Sample (SRS)
- A sample where every population member has an equal chance of being chosen.
Simulation
- Method for collecting data using laws of probability to represent all possible outcomes of an experiment.
Skewed
- Description of a distribution whose histogram extends farther on one side of the mean; skewed in the direction of the tail.
Standard Deviation
- Square root of variance; commonly measures the spread of a data set.
Standard Error
- Standard deviation of a sampling distribution, representing expected error per standard deviation from the mean.
Standard Normal Distribution
- A normal distribution with a mean of 0 and standard deviation of 1.
Standardized Score
- Another term for z-Score.
Statistic
- A number describing a sample.
Statistically Significant
- An effect unlikely to occur by chance alone.
Stratified Random Sample
- Sample chosen by defining subsets within the population, then taking an SRS from each subset.
Subjects
- Also referred to as Experimental Units.
Symmetric
- Describing a distribution that has mirror-image histograms on either side.
t-Distributions
- Family of symmetric, bell-shaped distributions with a larger standard deviation than the standard normal distribution, defined by degrees of freedom.
Treatment
- A specific condition applied to an experimental unit or subject.
Treatment Group
- Group of subjects receiving an actual treatment in an experiment.
Type I Error
- Rejecting the null hypothesis when it is true; probability is the significance value for the test.
Type II Error
- Not rejecting the null hypothesis when it is false; probability must be calculated for a specific alternative.
Unbiased Statistic
- Statistic whose mean equals the population mean.
Undercoverage
- Bias occurring when certain population groups are omitted from the selection process.
Variability
- Describes the spread of a data set.
Variable
- Any characteristic of an individual.
Variance
- Average of the squares of deviations from the mean, used as a measure of spread.
Voluntary Response Sample
- Sample that consists only of people who opt to participate, generally a poor method for data collection.
z-Score
- Indicates how many standard deviations an observation lies above or below the mean; also called Standardized Score.