1/58
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
Measurement, variation/variability, & comparison
Goals of statistics
Estimate the values of important parameters, measure variability, compare groups, test hypotheses
Sample of convenience
Collection of individuals that happen to be available at the time; bad because study sample and population of interest should be as similar as possible
Variable
Characteristic measured on individuals drawn from a population under study
Data
Measurements of 1+ variables made on a collection of individuals
Data formatting
Rows contain observations, columns correspond to variables
Response & explanatory variable
Response variable is predicted or explained from the explanatory variable
Response~outcome; explanatory~predictor
Populations
Parameters (Greek letters); the "true" value
Include all existing organisms of interest
Samples
Estimates (Roman letters); an approximation or guess about the "truth" based on a sample group chosen methodically & randomly
Parameters vs. estimates
Population parameters are constants; estimates are random variables that change from sample to the next
Bias
Systematic discrepancy between estimates and the true population characteristic; if biased, not accurate
What makes an estimate accurate (unbiased)?
If the average of estimates obtained is centered on the true population value
Volunteer bias
Volunteers for a study are likely to be different on average from the population
How can we tell when an estimate is biased?
If experiment is repeated and estimate is consistently too high or too low
Precision
Measure of how far apart repeated estimates might be; a mathematical concept that can be derived
How do we know if estimates are precise?
Precise estimates are relatively close to each other
Effect of larger sample on precision
Larger samples yield more precise estimates
Properties of a good sample
Random selection of individuals (representative of the population of interest), independent selection of individuals, sufficiently large (large samples yield more precise estimates)
Random sample
Each member of population has an equal and independent chance of being selected
Experimental vs. observational studies
Experimental: researcher randomly assigns individuals to treatment groups; more powerful and can help determine cause-and-effect relationships
Observational: assignment of treatments is not made by researcher; can only assess associations between variables
Categorical variables
Dichotomous (binary), ordinal (ordered categories; ex. cancer stage, education level, recovery from an operation), nominal (categories have no natural ordering; ex. drug treatment, region, species type)
Numerical (quantitative) variables
Continuous (can be measured; ex. age, weight, miles traveled) or discrete (can be counted; ex. number of offspring, number of days)
Why does the type of variable matter?
Determines the way it's summarized (average vs. percentages), the type of statistical analysis, the graphical display format
Graphing categorical variables
Bar graph
Graphing numerical variables
Histogram, box plot, dot plot (usually for smaller amounts of data)
Features of a box plot
Center line in box: median
Top edge of box: 3rd quartile/75th percentile
Bottom edge of box: 1st quartile/25th percentile
Whiskers: extend to smallest and largest non-extreme observations
Extreme observations: observations beyond 1.5 IQR from box edges
Graphing two numerical variables
Scatter plot, line graph, map
Graphing two categorical variables
Mosaic plot (always preferred!), grouped bar graph
Graphing a categorical and numerical variable
Multiple (stacked) histograms, side-by-side box plots
Bad graphs
Pie charts & 3D graphs (hard to make accurate interpretations & perspective skews visual perception)
Common graphing errors
Truncation of y-axis, cumulative graphs, ignoring conventions, mislabeled or missing axes
Guidelines for good graphics
Show the data
Represent magnitudes accurately
Draw graphical elements clearly, minimizing clutter (maximize data-to-ink ratio)
Make displays easy to interpret
Clearly identify axes
All figures should have captions
Be sure that figures with color are effective in b&w
How is location (central tendency) measured?
Mean, median, mode
Sample mean
Average of observations, center of gravity; find sum of observations and divide by count
Median
Middle measurement in a set of ordered data; if even number of observations average the two middle values
Mode
Most frequent measurement
How is variability (width or spread) measured?
Range, standard deviation, variance, interquartile range
Range
Maximum value - minimum value
Poor measure of distribution width/biased estimator of true population range; small samples tend to give lower estimates of range than large samples
Sample variance
s² = (x1 - x̄ )² + (x2 - x̄ )² + ... + (xn - x̄ )² / (n-1)
(Almost) average squared difference from the mean; in original units squared
Standard deviation
s = sqrt(s²)
Sigma is the true standard deviation, s is the sample standard deviation
Related to the average distance between the mean and each observation; measures variability/spread of a distribution
Interpreting standard deviation in a normal distribution (bell curve)
2/3 (66.6%) of data falls within 1 standard deviation of the mean
95% of data falls within 2 standard deviations of the mean
Skew
A measurement of asymmetry; direction of skew refers to pointy tail of distribution
Mean vs. median in skewed data
Right-skewed data: sample mean > sample median
Left-skewed data: sample mean < sample median
Estimating standard deviation from a histogram
Eyeball 2.5% of sample size from top (U) and bottom (L)
s ~= (U - L) / 4
Does repeating an experiment result in the same results?
No, because of random variation (especially with small samples)
What is the key concept behind a sampling distribution, confidence intervals, and standard errors?
Variability and uncertainty of samples (and sample means)
Effect of increasing sample size on spread and variation
Larger sample size reduces the spread/variation of the sampling distribution of an estimate
Sampling distribution
Probability distribution of all values for an estimate we might have obtained when the population was sampled; illustrates how much the sample mean could vary and what values are typical
Only known in theory or via simulations
Standard error
Quantifies the innate variability of an estimator (uncertainty); the standard deviation of the estimator's sampling distribution
Standard error of the mean
SE = s/sqrt(n)
Sigma notation rarely used because the true standard deviation is rarely known, in most cases we only have a sample
Is the sample mean a good estimator of the population mean?
Yes
Do larger samples fix bias?
No
Confidence interval
Quantifies uncertainty about a parameter; an interval estimate of plausible values (not point) for the true population mean
How is a 95% confidence interval worded?
We are 95% confident that the true population value lies within the interval...
If we drew repeated samples, we expect that about 95% of the confidence intervals would contain the true population value
(NOT probability)
2 standard error rule of thumb
Assuming a normally distributed population and/or sufficiently large sample size, the interval of 2 standard errors above and below the mean roughly estimate the 95% confidence interval for the mean
Confidence intervals of different samples...
Different samples yield different intervals
Intervals vary in width
~95% of the time contains true parameter value, ~5% of time doesn't (but in reality we never know)
Effect of increasing sample size on confidence intervals
Larger sample size yields narrower confidence intervals
99% confidence intervals vs. 95% confidence intervals
99% intervals are wider than 95% intervals
Distribution of the data
Set of all values in sample