1/117
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
population
the set of all “subjects” relevant to the scientific hypothesis under examination
variables
characteristics that differ among individuals
parameters
quantities describing a population (denoted by Greek letters)
census
a collection of data where the population is examined
random sample
each and every member of the population has an equal chance of being selected and each member is selected independently of others
mean
denoted by a bar
standard deviation
denoted by an “s”
sample
the subset of cases selected from a statistical population that are actually examined during a particular study
sample statistics
calculated from the collected sample and used to estimate the population parameters (denoted by roman letters)
how to get a good sample
take a random sample, be unbiased an precise
to get a good sample
carefully define your statistical population and select a sample that is as representative of the population a possible, where each subject is selected randomly and measurements are precise
bad samples
volunteer sample or a sample of convenience
experimental study
assigning treatment randomly, creating groups, imposing change
observational study
relying on comparisons of already existing conditions
2 types of variables
numerical (quantitative) and categorical (qualitative)
2 types of numerical variable
interval (arbitrarty zero) and ratio (true zero)
2 types of categorical variables
nominal (no order) and ordinal (ordered)
frequency distribution
describes the number of times each value of a variable occurs
histograms
used for numerical data - x axis has a continuous scale, data are “binned” into continuous categories, the bins are touching
histograms y-axis can be
frequency (count of observations in each bin), proportion(of the total observations in each bin) and density(the proportion of the total observations per unit of the bin width)
location or central tendency
distributions with a different central measurement using the mean
spread or scale
distributions with a different spread measured using the standard deviation
shape or skew
distributions with a long tail on one side or the other
mean
arithmetic average
median
middle of the data
mode
most commonly occurring observations
scale
most basic (max - min), not very informative
variance
“expected” squared difference between an observation and the mean
standard deviation is
positive square root of the variance
what is meant by “estimation”
it’s using the sample data to learn about the popualtion
estimation
the process of inferring a population parameter from sample data
uncertainty
a situation in which something is not known; in statistics it is the error of an estimate
Sampling distribution
the distribution of all the values for an estimate that we might have obtained when we sampled a population
a 95% confidence interval is a
range of values, calculated from sample data, that would contain the true population parameter in 95 out of 100 samples if the sampling process were repeated
uncertainty
decreases an precision increases with sample size
hypothesis testing
to determine whether an estimate can be simply explained by chance or is it special
Null hypothesis
is a specific statement made about population for the sake of argument, forces us to take a skeptical view
null hypothesis is used to
create a null model, compare test statistic calculated from the sample to the model
H0 is rejected if
we are surprised by the test statistic
P means
probability of observing a test statistic as extreme as, or more more extreme than, the one observed, assuming H0 is true
significance level α
a probability used as a criterion for rejecting the null hypothesis
P-value > α
fail to reject H0
P-value < α
reject H0
two-tailed tests
deviation is either direction would reject null hypothesis
type I error (α)
rejecting a true null hypothesis (false posititve)
type II error (β)
Failing to reject a false null hypothesis (false negative)
Power
the probability of correctly rejecting a false H0
Power depends on
how different the truth is from the null hypothesis, type I error rate, and sample size
things to consider when designing an experiment
reduce bias and decrease sampling error
reduce bias
have a control group, use randomization, use blinding
decrease sampling error
use replication, ensure balance, use blocking, implement extreme treatments
Control group
units that are similar to the treatment units except that they do not receive the treatment
random assignment
units that are otherwise “identical” are assigned to be controls or treatments
blinding
concealing information about whether a participant is in the control or treatment group (single blind) and sometimes researchers (double blind)
replication
application of treatment ti multiple, independent experimental subjects or units
balance
an equal number of units in the control and treatment minimizes the sampling error in both
blocking
divide experimental units into groups with known confounding variables
extreme treatments
a treatment you may add to an experiment to see if by doing more (or less) of a treatment will elicit more (or less) of an effect
probability distributions
the probability with which each possible observation of a variable occurs
normal distribution
continuous probability distribution, bell shaped curve, symmetrical around the mean
binomial distribution
discrete probability distribution, outcome of a number of bernouli trials
Bernoulli trials
only 2 possible outcomes, outcomes are indenpedent of each other
Poisson distribution
frequency distribution of events that occur rarely and randomly
Poisson conditions
the probability of 2 or more occurrences in a single sample subdivision is negligibly small, probability of one occurrence is proportional to the size of the subdivision, outcomes are independent, probability of an occurence is identical for all sample subdivisions
analysis of frequencies
compares the observed frequency of observations in different categories with the expected frequency under a null hypothesis
goodness-of-fit assumptions
no more than 20% of categories have expected counts <5, no category with expected count <1
how to calculate a G-test
calculate expected frequencies, calculate g based on the dissimilarities between observed and expected, compare G to the tabulated X² distribution
G test df
= k - p - 1, k= categories, p = estimated parameters
contigency analysis
asks whether 2 categorical variables are associated with one another
extrinsic hypothesis
derived from information other than the data you are analyzing
intrinsic hypothesis
expected frequencies are derived from the data you are analyzing
contingency df
k-p-1 or (r-1)(c-1), r = rows, c = columns
normal distribution defined by two parameters
mean and standard deviation
properties of a normal distribution
continuous distribution, probability measuted as the area under the curve, symmetrical, 2/3 of the are under the curve lies within on sd of the mean
central limit theorem
the mean of a large number of measurements randomly sampled from a non-normal (or normal) population is approximately normally distributed
Student’s t distribution
estimate the standard error or the mean use z-scores to estimate the probability of obtaining a particular sample mean given the population of means from which we are sampling
Z vs t
t distribution is more spread out because of uncertainty about the true population standard deviation
student’s t df
n - 1, n = sample size
single sample t - test
comapres the mean of a random sample to a population mean proposed in a null hypothesis
single-sample t-test assumptions
variable is normally distributed in the population, data is a random sample of the population
paired samples
individual observations in 2 samples are connected, i.e. an individual before and after treatment
paired t-test
difference between two paired observations, testing whether the mean difference between paired measurements equals some specified value (usually zero)
paired t-test assumptions
that the differences are normally distributed and the each pair is independent from the others
when can we ignore violations of assumptions
small deviations, large sample size, non-normality if deviations are small and size is large (central limit theory)
what to do when assumptions are violated
ignore violations, transformation, permutation/randomization/resampling methods
transformation
changes data to another scale of measurement to fit assumptions
data transformation by
applying the same mathematical formula to each observation, needs to be applied to each individual data point, must be backwards convertible to the original data
Anova
analysis of variance, compares and determines if there is significant difference between means of two or more unpaired groups
ANOVA assumptions
independence, normality, homogeneity of variance (homoscedasticity), random sampling
total sum of squares (SST)
measures the total variation in the dataset
between-group sum of squares (SSB or SSG)
this measures the variation between the group means and the grand mean
error sum of squares (SSE) or within-group SS (SSW)
this measures the variation within each group
R²
summarizes the contribution of the group difference to the total variation
Anova is a general linear model GLM
a mathematical representation of the relationship between a response variable and one or more explanatory variables
GLM fixed
categories of the explanatory variable are pre-determined (drug trial dosage)
GLM random
randomly sampled from a larger pool of groups, groups are not of particular interest
Tukey’s test
tests which treatments/groups differ from each other, compares all possible pairs of means to infer which one(s) differ(s) while controlling type I error, only do if anova rejected H0
balanced design
all treatment groups have the same n, tukey’s test
unbalanced design
not all treatments have the same n, tukey-kramer test
tukey’s df
= N - k, N = total sample size (sum for all groups), k = total number of groups in the anova