1/124
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a population
the entire collection of individual/observational units that share a common property
What is a sample
a subset of the population
Observation
Set of one or more quantities (measurements) on a single observational unit
what is a parameter
a quantity describing a statistical population
what is an estimate
a calculated quantity describing a statistical population, aka statistic
what is a categorical variable
describes membership in a category our group, they do not have magnitude on a numerical scale
what are the types of categorical variables
nominal (name) or Ordinal (ordered)
example of nominal variable
survival (alive or dead), eye color, breed of dog
example of an ordinal variable
life stage, size class (sm,m,lg), severity score
are months and weekdays categorical?
yes, they are treated both nominally and ordinally depending on the context
what is a numerical varaible
characteristics of observations that have magnitude on a numerical scale
what are the types of numerical variables
continuous and discrete
what is a continuous variable
any real number value, ex. degrees celcius, cm
what is a discrete variable
only take individible units though they can be non integers, ex. age at death, # of eggs in a bird nest,
Can continuous variables have an exact value?
the probability of any exact value is zero, whereas it is nonzero with discrete variables
what is a graph
visual representation of a relationship between variables
What is a bar graph
columns (bars) representing the distribution of a numerical variable against one or more categorical variables (better than pie charts)
what is an experimental study
researcher randomly assigns observational units to different groups (treatments). Researcher controls the treatments
what is an explanatory variable
the treatment variable that has been manipulated by the researcher (independent variable)
what is a response variable
the measured effect of the treatment (dependent variable)
what is an observational study
researchers have no control over which observational unit falls into which treatment. Passive observation
what is a scatter plot
graphical display of two numerical variables, each observation is represented as a point on a graph with 2 or 3 axis.
what is a line graph
uses dots connected by line segments to display trends measured over time or other ordered states
what is a frequency distribution
a representation (either graphical or tabular) that displays the number of observations within a given interval of a quantitative variable
what is the mode
interval corresponding to the highest peak in the frequency distribution
what is Skew
asymmetry in the shape of a frequency distribution for a numerical variable
what is the primary goal of statistics
to infer/estimate an unknown characteristic of an entire population based on sample data
what does location tell us
something about the average/typical individual units
what does spread tell us
how many measurements vary among individual units (how widely scattered are the values around the centre/location)
what is the most important location statistic
the arithmetic mean
what are the most important spread statistics
variance and standard deviation
what is variance
average squared deviation of observations from the mean. measures the overall uncertainty/spread
what is the coefficient of variance (CV)
it tells us how hard it is to guess a typical value (mean/location) relative to size. Small CV means most values fall close to the target relative to magnitude, Large CV means guesses are more uncertain.
what is the median
the middle measurment/value of a distribution
what is the mean
arithmetic average, more sensitive to extreme values than the median
what is the median for an even number of observations
the average of the two central numbers
what is left skewed
few small values, >1/2 of values exceed the mean
what is right skewed
few large values, >1/2 of values are less than the mean
what is the interquartile range
the measure of spread for the median (Q3-Q1)
what are the advantages of a box plot
quickly shows where the most values lie (location) and how spread out the data are. Provides quick info on symmetry and skewness
what is accuracy
how close. anestimate is to the population parameter (mean median or standard deviation)
what is precision
how much estimates vary across samples
acurrate and precise
low sampling variation and low bias
accurate and imprecise
high sampling variation and low bias
inaccurate and precise
low sampling variation and high bias
inaccurate and imprecise
high sampling variation and high bias
random sampling
minimizes bias and allows to quantify sampling variation
sample bias
occurs when observational units in the target population have a higher or lower probability of being sampled than others, leads to inaccuracy
sampling variation
refers to the natural variation in statistics across different samples drawn from the same population
sampling of convenience
whomever you can get, some members of a population are systematically more likely to be selected in a sample than others, leads to inaccuracy
survivorship bias
occurs when we draw conclusions based on the individuals that remain observable (sample observations) while ignoring those that did not survive, failed, or disappeared (unobserved)
what is μ
population mean (mu)
what is σ
population standard deviation (sigma)
what is σ^2
population variance (sigma squared)
what is X
sample mean (X bar)
what is s
sample standard deviation
what is s^2
sample variance
properties of sampling distribution
1. the mean of all samples is always equal to the population mean
2. selection of observational units is unbiased and independemt,
3. under random sampling, increasing size reduces variability, more precise estimation
as sampling size increases, sampling variability______
decreases, yielding a more precise estimate
what is Y
the sampling distribution of sample means
sampling error
how much a statistic calculated from a sample differs from the true population value due to random variation
criteria of a random sample
every observational unit in the population have an equal chance of being included, the selection is independent
does random sampling ensure accuracy or precision
accuracy, ensures precision as sample size increases
what is a confidence interval
A range of values that is likely to contain the population parameter with a specified level of confidence. reflects the uncertainty of the mean estimate
what is the margin of error
the maximum typical deviation we expect between a sample estimate and the true population due to sampling variability at a chosen confidence level
what does a larger confidence level (95% or 99%) tell us
provides a more plausible range for the parameter. values inside the interval are more plausible, those that lie outside are considered less plausible based on sample data
what does a 95% confidence interval tell us
we are 95% confident the true population mean lies between the lower and upper limits of the interval (NOT 95% probability)
higher confidence =
wider interval, less precision
lower confidence =
narrower interval, more precision
what is normal distribution
bell curve
what is a t-distribution
It is bell-shaped and has a mean of zero, but has a larger standard deviation than the standard normal distribution, and therefore, has thicker tails than the standard normal distribution.
what is the t-statistic
measures the deviation of the sample mean from the true population mean, it is unit free and comparable across populations. It is universal and only varies as a function of sample size
when do we use the t-distribution
when sample size and population standard deviation is unknown
what is the square-root transformation
it compresses variation, stabilizing variance and skewedness, primarily used to normalize right skewed data
What is Jensen's inequality?
a mathematical rule used to compare the average of a function to the function of an average, used to calculate risk and inequality in curved functions. used for small sample sizes
is variance biased or unbiased when divided by n instead of n-1
BIASED due to degrees of freedom
what is scientific evidence
information, facts, or data that support or challenge a claim, prediction, assumption, or hypothesis
statistical hypothesis framework
a quantitative method of statistical inference that allows to generate evidence for or against a hypothesis
Null Hypothesis (H0)
nothing systematic is going on
Alternative hypothesis (HA)
Hypothesis in which there are nonzero effects and there are differences between treatments
The frequentist hypothesis testing framework
evaluates how compatible the data are within an assumed model (H0) measured by P-values
what does a small p value mean (<0.05)
strong evidence against the null hypothesis, reject null
what does a large p value mean (>0.05)
strong evidence supporting null hypothesis, reject HA
what does the significance level alpha represent
the probability of rejecting the null hypothesis when it is actually true (a Type I error or "false positive")
Type I Error
H0 is true, we reject H0 (false positive)
Type II Error
H0 is false, we accept H0 (false negative)
if p
reject the null hypothesis
if p>a
accept null hypothesis
1-B
propability of correctly rejecting the null hypothesis when it is truly false
what is a one sample t test
testing a hypothesis based on a single sample
assumptions of one sample t test
the sample is random and independent, the variable of interest is assumed to follow normal distribution for small sample sizes
two sample t test
a statistical method used to compare the means of 2 groups of subjects
two sample t test assumptions
random and independent
what is a paired design
both treatments are applied to every sampled unit, minimized the impact of variability thus increasing precision
what is the F test
Variance among group means/variance within groups, large value indicates significance
Welch's t-test
compares the means of two groups and can be used even when the variances of the two groups are not equal
Welches t test assumptions
normal distribution, independent + random, does not assume equal variance, degrees of freedom can be non-whole numbers to provide more accurate results accounting for unequal variances
ANOVA
One continuous response variable and one categorical predictor variable, uses F statistic
what does ANOVA assume
independent, random, normally distributed, equal variance
tukeys honest test
post hoc test (after ANOVA), determines which group means differ