1/99
A complete sequence of AP Statistics vocabulary terms, providing definitions and relevant statistical notation for each concept found in the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Alternative Hypothesis
States that a treatment has had an effect or caused a change in the population
Bias
Describes a study which systematically favors certain outcomes
Binomial Distribution
The distribution of the probabilities of X successes out of n trials, calculated using p as the probability of any single success – B(n,p)
Blind
Describes an experiment in which the subjects do not know which treatment they are getting
Blocking
A statistical design which creates groups that are similar in some way, and then randomizes the treatments within each block
Central Limit Theorem
States that when an SRS is drawn from a population with mean μ and standard deviation σ, the sampling distribution for the sample mean will be approximately normally distributed, and have a mean μ and a standard deviation nσ
Chi-Square Distributions
A family of skewed-right distributions which take on only positive values and are defined by their degrees of freedom – the specific shape of the Chi-Square Distribution changes as the sample size changes
Chi-Square Goodness-of-Fit Test
Used to determine if a population has a certain hypothesized distribution
Chi-Square Test for Homogeneity
Used to determine if every category in the population has the same population
Chi-Square Test for Independence
Used to determine if there is a relationship between two categorical variables – also known as Chi-Square Test for Association
Coefficient of Determination
Tells what percent of the change in the response variable can be attributed to the change in the explanatory variable – symbolized as r2
Complement of an Event
The set of all outcomes not defined as successful outcomes for any event
Conditional Probability
The probability of an event occurring if it is known that another specific event has already occurred
Confidence Interval
An interval estimate of a parameter calculated using a sample from that population
Confidence Level
The probability that the desired parameter will fall into a confidence interval if many intervals were calculated from samples of the same size
Confounding Variable
A variable which could affect the result of a statistical test but has not been controlled for
Continuous Random Variable
A random variable which takes on all values in an interval of numbers
Control Group
Any group of subjects who receive either a placebo or no treatment at all during an experiment
Correlation
Measures the direction and strength of the linear relationship between two quantitative variables – symbolized as r
Critical Value
A value (z-score, t-score, or χ2 value) used in a hypothesis test to help determine if the null hypothesis should be rejected
Cumulative Distribution Function
A function which calculates the sum of the probabilities for each possible value for any random variable X
Degrees of Freedom
A value used to help determine significance for a t-test or a Chi-Square test – measured as n−1 in most cases, or (r−1)(c−1) when dealing with two-way tables
Dependent Trials
Trials whose probability is affected by the outcome of previous trials
Density Curve
A curve used to represent a distribution; always on or above the horizontal axis and has a total area of exactly 1 underneath it
Discrete Random Variable
A random variable with countable outcomes
Mutually Exclusive Events
Events which cannot occur at the same time
Distribution
A list of what values a variable takes on and how often it takes on each one of those values
Double Blind
Describes an experiment in which neither the subjects nor the researcher know which treatment each subject is getting
Empirical Rule
Also known as the 68−95−99.7 rule – is used as an approximation for what percent of the data falls within 1, 2, or 3 standard deviations of the mean in any normal distribution
Expected Value/ Mean
The 'average' of a data set
Experimental Units/Subjects
The individuals on which an experiment is conducted
Explanatory/Independent Variable
Attempts to explain the observed outcomes in a statistical study
Exploratory Data Analysis
Uses graphs and numerical summaries to describe the variables in a data set and the relationships among them
Factor
Any explanatory variable in an experiment
Five Number Summary
A method to describe a data set using the minimum, first quartile, median, third quartile, and maximum points in the data set
Geometric Distribution
A distribution of probabilities of when the first successful outcome occurs in a probability experiment
Hypothesis/Significance Test
A type of inference used to determine the feasibility of an assumed population parameter
Independent Trials
Trails whose probabilities are not affected by the outcome of previous trials
Individuals
People or objects described by a set of data
Inference
The statistical process of drawing conclusions about a population by examining data from a sample
Influential Point
A point which, if removed from the data set, would markedly change the regression equation for that data set
Interquartile Range (IQR)
The difference between the third and first quartiles of a data set
Law of Large Numbers
States that as increased numbers of observations are drawn from any population, the mean of the observations eventually approaches the mean of the population as closely as we would like to estimate it, and remains that close or closer
Least Squares Regression Line
A regression line which makes the sum of the squares of the vertical distances from the data points to the line as small as possible
Level
A numerical value of a factor of an experiment
Matched Pairs
A statistical design which compares two treatments – this is usually done with one sample receiving each treatment over a different time period
Median
The point at which 50% of the data is above and 50% of the data is below
Nonresponse
A type of bias that occurs when an individual chosen for a sample cannot be contacted or chooses not to participate
Normal Distribution
A symmetric, bell-shaped distribution in which approximately 68% of the data lies within one standard deviation of the mean, 95% lies within two standard deviations, and 99.7% lies within three standard deviations – defined by mean and standard deviation
Null Hypothesis
States that either a treatment has had no effect on a population, or that the population has not changed
Observation
Any single point from a data set
Outlier
An individual observation that falls outside the pattern of the data set – often defined as any number that is 1.5(IQR) outside of Q1 or Q3
P-value
The probability that the observed outcome would take on a value as extreme or more extreme than observed if the null hypothesis were true
Parameter
A number that describes a population
Percentile
Tells what percent of a data set falls below the given observation
Placebo
A false treatment which should have no effect on an experiment
Pooled Procedures
Occurs when separate samples are combined into a single sample for analysis – done only if population variances are equal
Population
The entire group of individuals that we want information about
Power of a Hypothesis Test
The probability that the test will reject the null hypothesis when the null hypothesis is false – equal to 1−P(Type II error)
Probability
The proportion of times an outcome would occur over a large number of trials
Probability Distribution Function
A function which assigns a probability for each possible value for any discrete random variable X
Proportion
Tells what percent of a data set falls into a given category
Qualitative Variable
A variable which takes on a non-numeric description
Quantitative Variable
A variable which takes on a numeric value
Quartiles
Observations which fall at the 25th, 50th, and 75th percentiles of a data set
Range
The difference between the maximum and minimum values of a data set
Random
When individual outcomes are uncertain, but there is a pattern to the distribution of the outcomes over time
Random Variable
A variable whose value is a numeric outcome of a random phenomenon
Randomization
Using the laws of probability to select members for a sample or assign treatments to samples in experiments
Regression Line
A straight line that describes how a response variable changes as the explanatory variable changes
Residual
The difference between and observed value of a response variable and its predicted value from a regression equation
Response/Dependent Variable
Measures the outcome of a statistical study
Robustness
A measure of how much the P-value of a test is affected if the conditions of the hypothesis test are not met
Sample
A part of the population used to gather information about the entire population
Sample Space
A list of all possible outcomes for a random event
Sampling Distribution
A distribution of values taken by a statistic in all possible samples of the same size from the same population
Sampling Frame
A list from which a sample is chosen – ideally consists of the entire population
Significance Level
The point at which it will be determined that a result is statistically significant
Simple Random Sample (SRS)
A sample in which every member and every group of size n has the same probability to be chosen
Simulation
A method for collecting data which uses the laws of probability to represent all possible outcomes of an experiment
Skewed
Describes a distribution whose histogram extends much farther to one side of the mean than the other in the direction of the 'tail'
Standard Deviation
Square root of the variance – used as a common measure of spread for a data set
Standard Error
The standard deviation of a sampling distribution – measures the amount of expected error per standard deviation from the mean
Standard Normal Distribution
A normal distribution with a mean of zero and a standard deviation of one
Statistic
A number that describes a sample
Statistically Significant
An observed effect so far removed from the mean that it would be unlikely to occur by chance alone
Stratified Random Sample
A sample chosen by splitting the population into several well-defined groups, then taking an SRS from each group
Symmetric
Describes a distribution whose histogram has its left and right sides as mirror images of each other
t-Distributions
A family of symmetric, bell-shaped distributions with a standard deviation larger than that of the standard normal distribution – defined by degrees of freedom
Treatment
A specific experimental condition applied to an experimental unit or subject
Treatment Group
A group of subjects who receive an actual treatment during an experiment
Type I Error
When the null hypothesis is rejected but it is in fact true
Type II Error
When the null hypothesis is not rejected but it is in fact false
Unbiased Statistic
A statistic from a sampling distribution whose mean must be equal to the mean of the population
Undercoverage
A type of bias that occurs when some groups of a population are left out of the selection process for the sample
Variability
Describes the spread of a data set
Variable
Any characteristic of an individual
Variance
The average of the squares of the deviations of the observation from their mean – used as a measure of spread
Voluntary Response Sample
Consists only of people who choose to participate – a poor method for collecting meaningful data
z-Score
A measure used to tell how many standard deviations above or below the mean an observation lies – also known as a Standardized Score