1/98
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Simulation
A method used to imitate a real-world statistical situation using random processes
Random process
A process that uses chance to determine outcomes, often with random numbers
Repeated trials
Running a simulation many times to observe the distribution of outcomes
Response variable (simulation)
The outcome recorded for each trial
Sampling variability
The natural differences that occur from sample to sample
Population
The entire group we want information about
Sample
A subset of the population used to draw conclusions
Census
A study that includes every member of the population
Sample size
The number of individuals in a sample
Representative sample
A sample that accurately reflects the population
Simple random sample (SRS)
Every possible sample of the same size has an equal chance of being chosen
Sampling frame
The list of individuals from which a sample is drawn
Stratified random sample
Population divided into similar groups (strata), then SRS taken from each
Cluster sample
Population divided into clusters, randomly select clusters and survey everyone in them
Systematic sample
Selecting every nth individual after a random start
Pilot survey
A trial run of a survey used to improve the final version
Voluntary response bias
Bias caused when individuals choose to participate
Convenience sampling
Choosing individuals because they are easy to reach
Undercoverage
Some groups in the population are left out or undersampled
Nonresponse bias
Bias from differences between respondents and nonrespondents
Response bias
Bias caused by the wording or design of survey questions
Observational study
Researchers observe but do not impose treatments
Retrospective study
Uses data from past events
Prospective study
Collects data going forward in time
Experiment
A study that imposes treatments to determine cause and effect
Explanatory variable
The variable that explains or causes changes
Response variable
The variable that is measured as an outcome
Control group
A group that receives no treatment
Randomization
Randomly assigning treatments to reduce bias
Replication
Using enough subjects to reduce variability
Blocking
Grouping similar subjects to reduce variability
Matched pairs design
A design comparing two treatments on the same or similar subjects
Factor
A variable manipulated in an experiment
Levels
The different values of a factor
Treatment
A specific combination of factor levels
Confounding variable
A variable that affects the response but is not controlled
Lurking variable
A hidden variable that affects both explanatory and response variables
Blinding
Subjects do not know which treatment they receive
Double blinding
Neither subjects nor evaluators know the treatment
Placebo
A fake treatment used for comparison
Placebo effect
A response caused by belief in treatment rather than the treatment itself
Sample space
The set of all possible outcomes
Law of large numbers
As trials increase, empirical probability approaches true probability
Complement rule
P(A) + P(not A) = 1
Mutually exclusive events
Events that cannot happen at the same time
Independent events
The outcome of one event does not affect another
Conditional probability
The probability of an event given another has occurred
Tree diagram
A diagram that shows all possible outcomes and probabilities
Probability model
A list of outcomes and their probabilities
Random variable
A variable whose value depends on chance
Expected value
The long-run average outcome (weighted mean)
Variance
The average squared distance from the mean
Standard deviation
The square root of variance, measures spread
Bernoulli trials
Trials with two outcomes, constant probability, and independence
Geometric distribution
Probability the first success occurs on the nth trial
Binomial distribution
Probability of a certain number of successes in fixed trials
10% condition
The sample is less than 10% of the population
Normal model
A bell-shaped distribution defined by mean and standard deviation
Success/failure condition
np ≥ 10 and nq ≥ 10
z-score
The number of standard deviations a value is from the mean
Scatterplot
A graph that shows the association between two quantitative variables
Explanatory variable
The variable on the x-axis that explains or predicts
Response variable
The variable on the y-axis that responds or is predicted
Form
The overall shape of a scatterplot (linear or non-linear)
Direction
Whether the association is positive, negative, or unclear
Strength
How closely the points follow a pattern; less scatter means stronger association
Unusual features
Outliers, clusters, or gaps in a scatterplot
Correlation coefficient (r)
A unitless measure of the strength and direction of a linear relationship between −1 and 1
Linear condition
The requirement that a scatterplot be linear in order to use correlation
Correlation
A numerical description of linear association that must include r
Association
A relationship between variables that does not require mentioning r
Least squares regression line
The line that minimizes the sum of squared residuals and best fits the data
Regression equation
An equation that models the relationship between explanatory and response variables
Slope (b₁)
The expected change in y for each one-unit increase in x
Intercept (b₀)
The predicted value of y when x = 0
Residual
The difference between an observed value and a predicted value
Positive residual
Observed value is greater than predicted value
Negative residual
Observed value is less than predicted value
Slope formula
b₁ = r × SD(y) ÷ SD(x)
Intercept formula
b₀ = mean(y) − b₁(mean(x))
Residual plot
A graph of residuals versus explanatory variable
Good residual plot
Residuals randomly scattered around zero, indicating a linear model is appropriate
Coefficient of determination (R²)
The proportion of variability in y explained by x
Interpretation of R²
If R² = 0.80, then 80% of the variation in y is explained by x
Extrapolation
Using a regression line to predict values outside the range of observed data
Why extrapolation is risky
Predictions far from the mean of x are less reliable
High leverage point
A point with an x-value far from the mean of x
Outlier
A point with a y-value far from its predicted value
Influential point
A point that significantly affects the regression model
Non-influential point
A point that is neither high leverage nor an outlier
Grouped data point
A point representing an average of several values, reducing variability
Regression output (constant)
The y-intercept of the regression line
Regression output (coefficient)
The slope associated with the explanatory variable
s (standard deviation of residuals)
The average distance from observed points to their predicted values
Interpretation of s
The model’s predictions are typically within s units of the actual values