1/54
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Density Curve
A smooth curve meant to represent the expected shape of a population.
Uniform Density Curve
A density curve where area equals length times width.
Area under the Density Curve
The proportion, or percent, of all observations that fall within a range.
Empirical Rule
Data follows a normal distribution if 68% of the data is within 1 standard deviation of the mean, 95% within 2, and 99.7% within 3.
Shape (SoCs)
Describes the overall shape of a distribution (approx. normal, skewed, symmetric, unimodal, bimodal, uniform).
Outliers (SoCs)
Data values that are far away from the rest of the data, identified using a modified boxplot or the interval [Q1 - 1.5IQR, Q3 + 1.5IQR].
Center (SoCs)
A 'typical' value for the data; can be the median (Q2) or the mean (X or μ).
Spread (SoCs)
Tells how much the data varies; measured by range, IQR (Q3-Q1), standard deviation (S or σ), or variance (S^2 or σ^2).
Boxplot
Displays the 5-number summary of a dataset (min, Q1, median, Q3, max).
pth Percentile
The data value in which p% of the individual data values are less than or equal to the data value.
Z-score
A standardized score that tells how many standard deviations from the mean a data value is.
LSRL (Least Squares Regression Line)
Predicts the response variable y-hat based on the explanatory variable x.
Residual
Gives how far away the actual y-value is from the predicted y-value (residual = actual - predicted).
Extrapolation
When the model is used to make predictions for x-values very far from the domain of the dataset.
Coefficient of determination (r^2)
About r^2% of the variability in y is accounted for by the LSRL.
y-intercept (a)
The predicted y-value when x=0.
S (standard deviation of the residuals)
The actual y is typically about S (standard deviation of residuals) away from the value predicted by the LSRL.
Linear (Relationships Between Two Numerical variables)
Looks at the scatterplot or residual plot.
Strength (Relationships Between Two Numerical variables)
Looks at the scatterplot or correlation coefficient r. Ranges from -1 to 1.
Direction (Relationships Between Two Numerical variables)
Positive means as x increases, y increases. Negative means as x increases, y decreases. The r value also indicates direction.
Center (Combining Random Variables)
Mean and all centers.
Spread (Combining Random Variables)
Standard deviation and all spreads.
Observational Study
The researcher does not impose a treatment; thus, no causation can be claimed.
Experimental Study
The researcher imposes a treatment on experimental units.
Explanatory variable
Helps predict/explain the response or the treatment 'cause'.
Response variable
The outcome being measured or the 'effect'.
Confounding variable(s)
Other factors that affect the response (and explanatory) that need to be controlled.
Completely randomized design
Randomly assign each EU a treatment.
Randomized block design
Match similar EUs (block size 2), then randomly assign each EU a treatment OR separate EUs into blocks, then randomly assign each EU a treatment within each block.
Matched pairs design
Each EU receives every treatment, but the order of treatment is randomized.
Population
A whole group of individuals you want to know about.
Sample
A subset of individuals you collect data from.
Parameters
Describe a population.
Statistics
Describe a sample.
Selection Bias/undercoverage
The way in which you choose a sample that leads to an unrepresentative sample.
Response Bias
The way in which you collect data from your sample that leads to misinformation.
Wording Bias
The wording of the survey leads to bias.
Measurement Bias
The tool used to collect data leads to bias.
Nonresponse Bias/Voluntary response bias
Not every individual in your sample provides data, or letting individuals decide to be part of your sample, may lead to bias.
Simple Random Sample (SRS)
Hat, random digit table, randInt().
Stratified Random Sample
Divide population into strata, then randomly pick some from each stratum.
Cluster Random Sample
Divide population into clusters, then randomly pick whole clusters.
Systematic Random Sample
Start at a random place, then pick every kth individual.
Convenience Sample
Pick individuals that are easy to collect data from.
Plan the simulation
Describe how to use a chance device to imitate one repetition of the process.
Simulations
As a special promotion for its 20-ounce bottle of soda, a soft-drink company printed a message on the inside of each bottle cap.
Permutation
n!/(n-r)!.
Combination
n! / (n-r)!r!.
Random variable
A variable that describes the outcome of a chance process.
Discrete RV
Takes on a fixed number of values with gaps between values.
Continuous RV
Takes on values from an interval.
Mean (Random variables)
The expected value aka the long-run average of X.
Standard deviation (Random variables)
Gives the average distance each value is from the mean.
Binomial Random Variable
Each trial has exactly two possible outcomes: success or failure, each trial's outcome does not affect the next trial.
Geometric Random Variables
Number of trials until first success.