Looks like no one added any tags here yet for you.
Statistics
the science of collecting, analyzing, and drawing conclusions from data
Descriptive
methods of organizing and summarizing statistics
Inferential
making generalizations from a sample to the populations
Population
an entire collection of individuals or objects
Sample
a subset of the population selected for study
Data
observations on single or multi-variables
Categorical
basic characteristics; doesn't make sense to take an average
Numerical
measurements or observations of numerical data
Discrete
listable sets (counts)
Continuous
any value over an interval of values (measurements)
Univariate
one variable
Bivariate
two variables
Multivariate
many variables
Symmetrical
data on which both sides are fairly the same shape and size (mean and median are similar)
Uniform
every class has an equal frequency (number) 'a rectangle'
Skewed
one side (tail) is longer than the other side. The skewness is in the direction of the tail (left or right)
Bimodal
data of two or more classes have frequencies separated by another class between them; two humps
Parameter
a numerical value that describes a characteristic of a population (typically unknown)
Statistic
a numerical value that describes a characteristic of a sample
Median
the middle point of the data (50th percentile) when the data is in numerical order. If two values are present, then average them together
Mean
𝜇 is for a population (parameter) and 𝑥̅ is for a sample (statistic)
Variability
allows a statistician to distinguish between usual and unusual occurrences
Range
single value: maximum-minimum
IQR
interquartile range: Q3-Q1
Standard deviation
𝜎 for population (parameter); s for sample (statistic) - measures the typical or average deviation of observations from the mean; sample standard deviation is divided by df = n - 1
Variance
standard deviation squared
Resistant
not affected by outliers
Non-Resistant
Mean, Range, Standard Deviation, Variance, IQR
Z-Score
a standardized score. This tells you how many standard deviations an observation is from the mean.
Coefficient of Determination (𝑟!)
a measure that assesses how well a model explains and predicts future outcomes.
Comparison of mean and median
Mound shaped - mean and median are nearly the same value; Skewed right - mean is larger than the median; Skewed left - mean is less than the median; The mean is always pulled in the direction of the skew away from the median.
Standard Normal Curve
It creates a standard normal curve consisting of z-scores with 𝑁(𝜇, 𝜎) = 𝑁(0,1)
Normal Curve
Symmetrical density curve that follows the empirical rule.
Assess Normality
Use graphs: dotplots, boxplots, histograms, or normal probability plot.
Empirical Rule (68-95-99.7)
Measures 1, 2, and 3 standard deviations (𝜎) from center (𝜇) of a normal curve.
68% of Observations
Fall within 1 𝜎 of 𝜇.
95% of Observations
Fall within 1 𝜎 of 𝜇.
99.7% of Observations
Fall within 1 𝜎 of 𝜇.
Boxplots
For medium or large numerical data. It does not contain original observations.
Modified Boxplots
Used where the outlier cutoffs are 1.5 IQRs from the end of the box (Q1 and Q3).
Outliers
Points more extreme than the cutoffs are considered outliers.
5-Number Summary
Minimum, Q1 (1st quartile), Median, Q3 (3rd quartile), maximum.
Correlation Coefficient (r)
A quantitative assessment of strength and direction of a linear relationship.
Population Parameter
Uses (𝜌) for population parameter.
Correlation Values
Values [-1,1]: 0 - no correlation, (0 ± .5) - weak, [±.5, ±.8] - moderate, [±.8, ±1] - strong.
Least Squares Regression Line (LSRL)
Minimizes the sum of the squared residuals on a scatterplot.
Residuals
Difference between observed and predicted responses.
Residual Plot
Indicates a good model if (1) no discernable pattern and (2) points spread about evenly above and below the LSRL.
Coefficient of Determination (𝑟!)
Gives proportion of variation in responses that is explained by the relationship of x and y.
Slope (b)
For every additional x, the predicted response will in/decrease by about b.
Extrapolation
LSRL cannot be used to predict responses outside the scope (interval) of explanatory values.
Influential Points
Points that if removed significantly change the LSRL.
Outliers (in context)
Points with large residuals and do not follow the trend of the bivariate data.
Census
A complete count of the population.
Sampling Frame
A list of everyone in the population.
Sampling Design
Refers to the method used to choose a sample.
Simple Random Sample (SRS)
Every individual has the same chance of being chosen and every group of size n has the same chance of being chosen.
Stratified Sampling
Divide the population into homogenous groups called strata, then SRS each strata.
Advantages of Stratified Sampling
More precise than SRS and cost reduced if strata already available
Disadvantages of Stratified Sampling
Difficult to divide into groups, more complex formulas, must know population
Cluster Sampling
Based on location; select a random location and sample ALL at that location.
Advantages of Cluster Sampling
Cost is reduced, is unbiased, and don't need to know population.
Disadvantages of Cluster Sampling
May not be representative of population and has complex formulas.
Random Digit Table
Each entry is equally likely and each digit is independent of the rest.
Random Number Generator
Calculator or computer program; RandInt(lower, upper).
Bias
Systematically favors a certain outcome.
Sources of Bias
Factors that can lead to biased results in sampling.
Voluntary Response Bias
People choose themselves to participate; polarized responses.
Convenience Sampling
Ask people who are easy to find, friendly, or comfortable asking.
Undercoverage
Subset of the population is left out of selection process.
Non-response Bias
Someone cannot or does not want to be contacted to participate.
Response Bias
False answers; can be caused by a variety of things.
Wording of the Questions
Leading questions that can influence responses.
Observational Study
Observe outcomes without giving a treatment.
Experiment
Actively imposes a treatment on the subjects; randomly assigns experimental units.
Experimental Unit
Single individual or subject that receives a treatment.
Factor
The explanatory variable; what is being tested.
Level
A specific value of the factor.
Response Variable
What you are measuring with the experiment.
Treatment
Experimental condition applied to each unit.
Control Group
Used to compare the factor to for effectiveness; does NOT have to be a placebo.
Placebo
A treatment with no active ingredients (provides a control).
Blinding
A method used so subjects are unaware of treatment or control group.
Double Blinding
Neither subjects nor evaluators know which treatment is being given.
Principles of Experimental Design
Control, Replication, Randomization, Comparison.
Control in Experimental Design
Isolates effects of treatment variable by keeping all other variables constant.
Replication in Experimental Design
Reduce impact of chance variation due to random assignment to different treatments.
Randomization in Experimental Design
Uses chance to assign subjects to treatments to create similar treatment groups; reduces bias and establishes cause and effect.
Comparison in Experimental Design
Measures responses of control and treatment groups to determine effectiveness of treatment.
Completely Randomized Design
All units are assigned to all of the treatments randomly.
Randomized Block Design
Units are subjectively blocked by similar characteristics and then randomly designed within each block; reduces variation and controls confounding variable.
Matched Pairs Design
Matched up units by characteristics and then randomly assigned.
Confounding Variables
The effect of the variable on the response is indistinguishable from the effects of the factor being tested; happens in observational studies and when blocking should occur.
Law of Large Numbers
As an experiment is repeated, the experimental probability gets closer and closer to the true (theoretical) probability.
Probability
The proportion of time an outcome occurs over a long run of trials.
Sample Space (S)
Collection of all possible outcomes.
Events
Any subset of the sample space; denoted by capital letter.
Complement
All outcomes NOT in the event.
Union
A or B, all the outcomes in both circles (𝐴∪𝐵).
Intersection
A and B, happening in the middle of A and B (𝐴∩𝐵).