1/97
Flashcards covering key vocabulary and concepts from an AP Statistics course, based on provided study guide notes. These flashcards are optimized for vocabulary review.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
The science and art of collecting, analyzing, and drawing conclusions from data.
Individual
Object described in a set of data.
Variable
Aspect that can take different values for different individuals.
Distribution
Pattern of variation of a variable.
Descriptive Statistics
Analyzing data.
Inferential Statistics
Making inferences / drawing conclusions from data.
Nominal Variable
No certain order.
Ordinal Variable
No order *could be numbers if they don’t measure anything (eg. cell phone digits)
Discrete Variable
Fixed set of possible values with gaps between them, whole numbers or defined intervals, countable or countably infinite.
Continuous Variable
Infinite possibilities, decimals / fractions, any value in an interval on the number line.
Basic Statistic Vocab
Also known as cases / observational units.
Frequency Table
Shows what values the variable takes & how often it takes them types of statistics.
Two-way table
Summarizes data on relationship between two categorical variables for a group of individuals.
Side-by-side bar graph
Bars showing the distribution of a categorical variable for each value of another categorical variable (grouped side-by-side).
Segmented bar graph
Distribution of a categorical variable as segments of a whole (bars stacked on top of each other & proportional to relative frequencies).
Mosaic plot
The width of the bars proportional to number of individuals in that category.
Association
Knowing the value of one variable allows you to predict value of the other.
Back-to-back stemplot
Quantitative data that’s split into two groups.
Mean
Average.
Median
Middle value.
Mode
Most common value.
IQR
Interquartile range (middle 50% of values).
Standard deviation
Typical distance from mean.
Resistant Measure
Not sensitive to skewness / outliers.
Statistic
A value that describes a characteristic of a sample.
Parameter
A value that describes a characteristic of a population.
Percentile
pth percentile is value with p% observations less than or equal to it.
Cumulative relative frequency graphs / ogives
Plots points corresponding to the percentile of a value in the distribution & points connected with line segments to create the graph.
Standardized scores (z-scores)
How many standard deviations from the mean a value is (& what direction).
Density curve
Simplified model of a distribution of a quantitative variable, always on or above horizontal axis, has an area of exactly 1 underneath it.
Normal distributions
Bell shaped & symmetric & unimodal distribution approximated with a normal curve (density curve).
Extrapolation
Using a regression line to make predictions way outside of the interval of x-values used to generate the line (beyond the scope of your data).
Least-squares regression line
Line that minimizes sum of squared residuals.
Residuals
Actual value – predicted value (based on line).
Residual plots
Scatterplot that plots residuals against explanatory variable, determines whether a linear model is appropriate (check for random scatter & no leftover curved pattern).
Standard deviation of residuals (s)
Measures typical residual (distance between predicted & actual).
Coefficient of determination (r2)
Square of correlation r when finding r from r2, make sure to consider direction of correlation!
Influential points
Points that, if removed, substantially change the slope, y-int, r, r2 , or s *these are very often influential (but not automatically guaranteed to be).
Transforming to achieve linearity
Applying a function to a quantitative variable (changes the scale of measurement) in order to make the scatterplot more approximately linear (in order to use linear regression methods).
Sampling
Selecting a random group of people out of a whole population (that’s representative of the population).
Sampling frame
The group of members from the population from which we select our sample.
Sampling survey
Collects data from the individuals in the sample (to learn about the population).
SRS (Simple Random Sample)
Every group of n individuals has an equal chance of being selected.
Stratified Sample
SRS selected from each strata. Strata: group w similar characteristics assumed to be associated with the variables being measured.
Clustered Sample
Randomly selecting entire clusters, Clusters: diff responses between (hopefully representative of population).
Systematic Sample
Randomly select starting point & select every kth individual after.
Convenience Sampling
Individuals who are easy to reach.
Voluntary Response Sampling
Allows individuals to choose to be in sample.
Bias
Likely to systematically overestimate or underestimate the value.
Undercoverage
Certain individuals less likely / cannot be chosen in a sample.
Nonresponse
Individual chosen for sample can’t be contacted / doesn’t participate.
Response Bias
Systematic pattern of inaccurate answers to a survey question.
Observational Studies
Observes individuals & measures variables of interest (does not influence responses).
Experiments
Imposes a treatment on individuals & measures their responses.
Placebo
No active ingredient.
Treatment
Condition imposed on individuals.
Experimental unit
Individual to which treatment applied, subject: human experimental unit.
Factor
Explanatory var that’s manipulated (may cause change in response var).
Levels
Diff possible values of a factor.
Confounding
When variables are associated so that their effects on a response variable can’t be distinguished from one another.
Control group
Provides a baseline for comparison.
Replication
Use enough subjects (diff in effects can be distinguished from chance variation).
Double blind
Neither subjects nor the ppl measuring know the treatment.
Single-blind
Only one of the groups (above) knows.
Completely randomized design
Experimental units assigned to treatments completely at random.
Randomized block design
Random assignment within each block. Block: group of experimental units known to be similar in some way that could affect their response to the treatments.
Matched pairs design
A type of RBD where blocks are pairs.
Statistical significance
Observed diff is larger than can be attributed to chance alone.
Statistical inference
Generalizing results to population, assuming sample is representative of population (ensured by random sample).
Sampling variability
Diff random samples (same size, same population) produce diff estimates.
Random process
Generates outcomes purely by chance.
Probability
Likelihood of an event to happen.
Law of large numbers
More trials means proportion approaches true probability (more accurate).
Simulation
Imitates random process such that simulated outcomes are consistent with real-world outcomes.
Probability model
Description of a random process that includes a list of all possible outcomes & the probability for each outcome.
Sample space
List of all outcomes.
Event
Any collection of outcomes from a random process.
Complement
The probability that an event does not occur.
Intersection
P(A and B) = A ∩ B (both A and B must be true).
Union
P(A or B) = A ⋃ B (at least one–either A or B, or both–must be true).
Mutually exclusive events
Cannot occur simultaneously (no outcomes in common) (also known as disjoint).
Non-mutually exclusive events
Can occur simultaneously.
Conditional probability
Probability that an event happens given that another event is known to have happened: P(A | B).
Independent events
Knowing whether or not one event has occurred does not change the probability that the other event will happen P(A | B) = P(A | BC) = P(A).
Random variable
Takes numerical values that describe the outcomes of a random process.
Discrete Random Variable
Fixed set of values with gaps between them can be described using probability distributions & histograms (each bar a value).
Continuous Random Variable
Any value in an interval on the number line probability distribution: density curve.
Binomial Random Variable
Use acronym BINS to check for binomial setting.
Geometric random variable
Number of trials it takes to get a success in a geometric setting.
Sampling distribution
The distribution of a statistic in all possible samples of the same size from the population.
Unbiased Estimator
Mean of sampling distribution of a statistic equal to true value of parameter same as accuracy check center (unbiased estimator).
Biased Estimator
Statistics consistently do not match parameters same as precision check variability choose an estimator with low bias & low variability.
Confidence interval
An interval of plausible values for an unknown population parameter based on sample data.
Confidence Level
Success rate / capture rate of the method that produces the interval accounts for sampling variability & increases confidence that our parameter value is correct.
Power of a test
Probability that a test will find convincing evidence for Ha when a specific alternative value of the parameter is true (probability that you avoid a type II error).
Chi square tests for goodness of fit
To check whether a hypothesized distribution seems valid.
Chi-square test for homogeneity
Compares distributions of a single cat var over multiple populations / treatments (multiple independent samples).
Chi-square test for independence & association
Compares distributions of two cat var (association) in a single population (one sample).