1/121
AP Statistics Exam Flashcards
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
The science of collecting, analyzing, and drawing conclusions from data.
Descriptive Statistics
Methods of organizing and summarizing statistics
Inferential Statistics
Making generalizations from a sample to the population.
Population
An entire collection of individuals or objects.
Sample
A subset of the population selected for study.
Variable
Any characteristic whose value changes.
Data
Observations on single or multi-variables.
Categorical Variable
Basic characteristics (qualitative).
Numerical Variable
Measurements or observations of numerical data (quantitative).
Discrete Variable
Listable sets (counts).
Continuous Variable
Any value over an interval of values (measurements).
Univariate
One variable.
Bivariate
Two variables.
Multivariate
Many variables.
Symmetrical Distribution
Data on which both sides are fairly the same shape and size. “Bell Curve”
Uniform Distribution
Every class has an equal frequency (number) “a rectangle”
Skewed Distribution
One side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right)
Bimodal Distribution
Data of two or more classes have large frequencies separated by another class between them. “double hump camel”
Shape (S.O.C.S.)
Overall type (symmetrical, skewed right left, uniform, or bimodal)
Outliers (S.O.C.S.)
Gaps, clusters, etc.
Center (S.O.C.S.)
Middle of the data (mean, median, and mode)
Spread (S.O.C.S.)
Refers to variability (range, standard deviation, and IQR)
Parameter
Value of a population (typically unknown)
Statistic
A calculated value about a population from a sample(s).
Median
The middle point of the data (50th percentile) when the data is in numerical order.
Mean
μ is for a population (parameter) and x is for a sample (statistic).
Mode
Occurs the most in the data. There can be more then one mode, or no mode at all if all data points occur once.
Variability
Allows statisticians to distinguish between usual and unusual occurrences.
Range
A single value – (Max – Min)
IQR
Interquartile range – (Q3 – Q1)
Standard Deviation
Measures the typical or average deviation of observations from the mean – sample standard deviation is divided by df = n-1
Variance
Standard deviation squared
Resistant
Not affected by outliers.
Trimmed Mean
Use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers.
Z-Score
Is a standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1.
Normal Curve
Is a bell-shaped and symmetrical curve.
Empirical Rule (68-95-99.7)
Measures 1σ, 2σ, and 3σ on normal curves from a center of μ.
Boxplots
Are for medium or large numerical data. It does not contain original observations. Always use modified boxplots where the fences are 1.5 IQRs from the ends of the box (Q1 & Q3). Points outside the fence are considered outliers. Whiskers extend to the smallest & largest observations within the fences.
5-Number Summary
Minimum, Q1 (1st Quartile – 25th Percentile), Median, Q3 (3rd Quartile – 75th Percentile), Maximum
Sample Space
Is collection of all outcomes.
Event
Any sample of outcomes.
Complement
All outcomes not in the event.
Union
A or B, all the outcomes in both circles.
Intersection
A and B, happening in the middle of A and B.
Mutually Exclusive (Disjoint)
A and B have no intersection. They cannot happen at the same time.
Independent
If knowing one event does not change the outcome of another.
Experimental Probability
Is the number of success from an experiment divided by the total amount from the experiment.
Law of Large Numbers
As an experiment is repeated the experimental probability gets closer and closer to the true (theoretical) probability. The difference between the two probabilities will approach “0”.
Conditional Probability
Takes into account a certain condition.
Correlation Coefficient (r)
Is a quantitative assessment of the strength and direction of a linear relationship.
Least Squares Regression Line (LSRL)
Is a line of mathematical best fit. Minimizes the deviations (residuals) from the line. Used with bivariate data.
Residuals (error)
Is vertical difference of a point from the LSRL. All residuals sum up to “0”.
Residual Plot
A scatterplot of (x (or ŷ) , residual). No pattern indicates a linear relationship.
Coefficient of Determination (r^2)
Gives the proportion of variation in y (response) that is explained by the relationship of (x, y). Never use the adjusted r^2.
Extrapolation
LSRL cannot be used to find values outside of the range of the original data.
Influential Points
Are points that if removed significantly change the LSRL.
Outliers
are points with large residuals.
Census
A complete count of the population.
Sampling Frame
Is a list of everyone in the population.
Sampling Design
Refers to the method used to choose a sample.
SRS (Simple Random Sample)
One chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.
Stratified Sampling
Divide the population into homogeneous groups called strata, then SRS each strata.
Systematic Sampling
Use a systematic approach (every 50th) after choosing randomly where to begin.
Cluster Sample
Based on location. Select a random location and sample ALL at that location.
Random Digit Table
Each entry is equally likely and each digit is independent of the rest.
Random # Generator
Calculator or computer program
Bias
Error – favors a certain outcome, has to do with center of sampling distributions – if centered over true parameter then considered unbiased
Voluntary Response
People choose themselves to participate.
Convenience Sampling
Ask people who are easy, friendly, or comfortable asking.
Undercoverage
Some group(s) are left out of the selection process.
Non-response
Someone cannot or does not want to be contacted or participate.
Response Bias
False answers – can be caused by a variety of things
Wording of the Questions
Leading questions.
Observational Study
Observe outcomes without giving a treatment.
Experiment
Actively imposes a treatment on the subjects.
Experimental Unit
Single individual or object that receives a treatment.
Factor
Is the explanatory variable, what is being tested
Level
A specific value for the factor.
Response Variable
What you are measuring with the experiment.
Treatment
Experimental condition applied to each unit.
Control Group
A group used to compare the factor to for effectiveness – does NOT have to be placebo
Placebo
A treatment with no active ingredients (provides control).
Blinding
A method used so that the subjects are unaware of the treatment (who gets a placebo or the real treatment).
Double Blinding
Neither the subjects nor the evaluators know which treatment is being given.
Control (Principles of Experimental Design)
Keep all extraneous variables (not being tested) constant
Replication (Principles of Experimental Design)
Uses many subjects to quantify the natural variation in the response.
Randomization (Principles of Experimental Design)
Uses chance to assign the subjects to the treatments.
Completely Randomized Design
All units are allocated to all of the treatments randomly
Randomized Block Design
Units are blocked and then randomly assigned in each block –reduces variation
Matched Pairs Design
Are matched up units by characteristics and then randomly assigned. Once a pair receives a certain treatment, then the other pair automatically receives the second treatment. OR individuals do both treatments in random order (before/after or pretest/post-test). Assignment is dependent
Confounding Variables
Are where the effect of the variable on the response cannot be separated from the effects of the factor being tested – happens in observational studies – when you use random assignment to treatments you do NOT have confounding variables!
Random Variable
A numerical value that depends on the outcome of an experiment.
Discrete Random Variable
A count of a random variable
Continuous Random Variable
A measure of a random variable
Discrete Probability Distributions
Gives values & probabilities associated with each possible x.
Fair Game
A fair game is one in which all pay-ins equal all pay-outs.
Binomial Distributions
Two mutually exclusive outcomes, fixed number of trials (n), each trial is independent, the probability (p) of success is the same for all trials,
Random variable (Binomial)
Is the number of successes out of a fixed # of trials. Starts at X = 0 and is finite.
Geometric Distributions
Two mutually exclusive outcomes, each trial is independent, probability (p) of success is the same for all trials. (NOT a fixed number of trials)
Random Variable (Geometric)
When the FIRST success occurs. Starts at 1 and is ∞.