1/238
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
why do we need descriptive stats
to interpret results
facilitates predictions from patterns
describes information (communication)
real world events are probalistic (based on probability)
variability is a _____, not an exception
rule
we use distributions to map out different outcomes happen, and use these to determine probability
how do we get probability curves?
measure samples from the same population, and use that data to create samples
many things happen in a normal distribution, and we can use that to make predictions
central tendency
a center, representative value
median (middle number)
mean(average)
mode(recurring)
central tendencies on a slanted curve
mode at high end, median in middle, mean at bottom
variance
how spread out and different the data is
sd is used to represent
variance
how does a sample size impact variance
big sample size → less variance
ecological validity and sd
high ecological validity lowers SD (less variance)
bivariate stats
the relation between continuous or categorical variables
pearson’s coefficient
describes the direction and strength of a relationship between variables, offers a shortcut to describing data
issues with descriptive stats
interpretation can be complicated
data can change but have the same range and central tendency
in these situations, box plots don’t work anymore (violin plots better)
so, we need something else
why do we normalize data
it is often challenging and/or misleading to use raw data to make predictions
for ex, hard to find where the 95% prob cut off lines are
normalizing data
transferring data (through linear relations) to have a mean of 0 and SD of 1 (x-axis)
easier to communicate info
cutoff for 95% confidence is [-1.96,1.96]
goal in understanding data
we aim to understand the population, but often population parameters are unknown/unknowable
so, we collect samples and use them to make educated inferences about the population
how do we normally estimate>
we estimate within a range, rather than pinning down a single value
inferential stats predicts a range of an interval, that we hope (to a high probability) is correct
confidence interval - theoretically
we are finding a range that is very likely to contain the true population parameter (often the mean)
uses a sample statistic (sample mean) as a point estimate
incorporates sampling variability (through SEM and a critical value [often a t-critical value])
how we build our confidence interval
use sample stats:
sample mean (assuming its close to Mu)
standard error of the mean (from sample variance and sample size)
use on our sampling distribution of sample means
Mu ± 1.96 (z critical value) * ( σ/√(n) )
central limit theorem
the shape of the sampling distribution of sample means
with a sample size big enough, the distribution is approximately normal
even if the population distribution is non-normal
parameters of central limit theorem
mean = Mu
standard deviation = σ/√(n) = SEM
Standard Error of the Mean
The SEM quantifies the typical distance a sample mean is likely to be from the actual population mean
a smaller SEM indicates a more precise estimate of the population mean
how can we calculate CI with only sample statistics?
Law of Large Numbers - as sample size increases, sample parameters will get closer to population parameters
so, the sample mean (x-bar) will become closer to the population mean (Mu)
we use the standard deviation from a single sample (s) to estimate population standard deviation (σ)
we have to do this, since we often don’t know the parameters of the population
symbols and what they mean
x-bar - sample mean
Mu - population mean
σ - population SD
s - sample SD
n - sample total
in CI calculations, when would we replace the z-critical values with t-based critical values?
when n is small
x-bar ± t critical values * SEM: σ/√(n)
as n increases, what happens to range of sampling distribution of sample means
as n increases → range decreases → CI range decreases
this means our statistics are becoming more accurate
how does n size impact SEM
n increases → range becomes more narrow → SEM gets smaller
degrees of freedom
how many independent pieces of info (not relying on info received elsewhere or estimated) we have available to estimate a parameter
so, whenever we estimate a parameter, we are “using up” some of that freedom
when do we divide by n and when by n-1?
if we are describing the variability within a sample only, we may divide by n
ex, if we are using the sample sd as a descriptive statistic
if we are making inferences about a population, we need to divide by n-1
ex, in our sample sd stat we need to divide by n-1
sample mean vs variance calulation
sample mean - divide by n because it is independent
variance/sd - divide by n-1, because it is dependent on sample mean
null hypothesis testing
H0 = baseline model saying “there is no effect, no difference, or no relationship in the population”
we start with this, to test if a prediction is statistically possible/meaningful
HA/H1 = alternative hypothesis saying there is an effect, and a mean difference doesn’t equal zero
example of importance of alternate hypothesis
alt hypothesis determines what data you calculate
if you alternate only focuses on a specific direction (A is better than B), your confidence inference will only look at one end of the date
if your alternate is more general (there is a statistical difference between A and B), your confidence inference will consider both sides of data
important to do this to not limit yourself
the 2 rules to making decisions in stats
type 1 error
type 2 error
type 1 error
rejecting H0 when it is actually true
saying there is an effect when there isn’t one
a false alarm
type 2 error
accepting H0, when it is not true
saying there is no effect when there is one
a miss
type 1 error details
alpha set before collecting data (often 0.05, so 5%)
with alpha = 0.05, you will make a type 1 error about 5% of the time
based along the null curve
what is alpha? (type 1 error)
the significance level, used as a cutoff for deciding when to reject/accept H0
used to carve the rejection region
alpha used on a null curve
critical values = cut off null curve to alpha/2 is on the outside of the cutoffs
if the sample data falls outside of these critical values, we can reject null (H0)
type 2 error details
based on the alternate hypothesis curve
β is something we measure, dependent on:
population size
sample size
significant level
variance
if sample data falls inside the β, we can accept null
type 2 error (β) in relation to other parameters
bigger effect size → smaller β
larger n → shrinks standard error → smaller β
significant level (alpha) → stricter thresholds → rejecting H0 is harder → increases β
variance (σ2) → harder to detect effects → increases β
statistical power
the probability of correctly rejecting H0 when it is false (correctly determining an effect)
power = 1 - β
we want a small β
before conducting research, the researchers estimate the needed sample size to achieve acceptable power (80%, so β=.20)
AKA the sensitivity of a study
as sd decreases, what happens to β and alpha
their overlapping areas become smaller
this helps us minimize errors, to decrease both β and alpha
p-values
the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true
quantifies to improbability of this data being generated under the null
does NOT tell us the probability of null being true
low vs high p-value
a low p-value suggests that the observed data is unlikely if the null hypothesis is correct, providing strong evidence for the alternative hypothesis
ronald a fisher (1920-30s)
introduced levels of significance as convenient thresholds for interpreting whether data provide evidence against H0.
Neyman and Pearson (1930s)
developed hypothesis testing framework for Type I error and Type II error
Their work put the p-value into a decision making context
CI and P value: two sides of the same coin
CI uses sample distribution, centered on sample mean
asks if H0 falls inside my plausible range
P-value uses null hypothesis as the distribution, centered on 0.5 (50%)
asks how much variance can be random
asks how extreme my data is if H0 were true
data that is seemingly not critical can be made critical if we have…
a big enough sample size
problem with relying on P value
does not tell us how big an effect is
even if we have calculated a statistical significance, it may be tiny IRL
we can get around this by calculating …..
effect size
the magnitude of an effect/strength of an association
standardized mean differences
independent of sample size
scale free, allows for comparison across studies
2 statistical approaches to evaluate an inference about a population
null hypothesis significance testing (NHST)
confidence intervals (CI)
should give us the same conclusions
NHST evaluations
start with null
compute a test statistic relative to the null
compare critical values ( or p-values )
decide whether to reject null or not
binary outcome: reject vs not reject
CI in making evaluations
construct an interval around the sample estimate
if the interval includes the null value, then that implies significance at the corresponding alpha
CI also shows plausible effect sizes, not just a decision
when do we rely on the t distribution
in CI calculations, we often can’t use z critical values because we don’t know population sd
z critical values are rarely used for inference
df calculation (in general)
n - 1
big sample size → big df (good!!!!)
t distribution
a family of distributions
looks like the standard normal (bell shaped, centered at 0)
has heavier tails to account for extra uncertainty from estimating σ
as n increases, t approaches the normal standard z
takes into account degrees of freedom (bigger df → close to normal)
bigger df leads to what (in t distributions)
big df → means a big sample → thinner curve → thinner confidence intervals
rule of thumb for df in t distributions
the benefits of an increased sample size asymptote at n=30 (df=29)
3 types of t tests
one sample t-test → tests if a sample differs from a known/hypothesized population mean
paired sample t-test → tests if means differ across two related measurements (e.g. before vs after)
independent two-sample t test → tests if two independent group means differ
one sample t test
determines whether mean of single sample differs significantly from specified population mean/hypothesized value
evaluates null hypothesis that the sample comes from, under assumption of approximate normality and independent observations
is the average score in one group far enough from a comparison value that chance alone is unlikely to explain it???
one sample t-test: CI approach
propose a null hypothesis
collect a sample
compute descriptive statistics (x-bar, s, n, df, SEM)
calculate CI based on df (quantile on t-distribution with df = n-1 at probability 1 - α/2
calculation based on sample distribution
make decision (if CI includes null, fail to reject null)
one sample t-test: NHST approach
propose null
collect data
construct null hypothesis distribution that explains probability by CHANCE, and calculate descriptive stats
compute t value, and make a decision
when to reject with t values
reject H0 if | tobs | > t1 - α/2, df
we are comparing our observed t-value to the critical one
t test tables
gives us our critical t value
a table of df by alpha
meet in the middle to find our critical value
lets us compare the absolute value of our observed t-value to the critical
if our calculated t-value falls within the critical region, 95% by chance
one sample t-test: two tailed
alt hypothesis predicts a difference but not direction, critical region is split on BOTH ends of the distribution
α = 0.025 (bc α/2 on either side)
more conservative and requires a larger effect to reject null (since α is divided across both tails)
one sample t-test: one-tailed test
alt hypothesis predicts a specific direction, so critical region is entirely on only one side of the distribution
α = 0.05 on one side
more statistical power for detecting an effect but blind to effects in opposite direction
not ideal in the research world
assumptions of one sample t test
scale of measurement - dependent variable is continuous (interval or ratio scale)
normality - CLT for large samples, raw close to normal for small samples
independence - each obs is independent of the others (no repeated or paired)
population variance unknown
paired sample t test
often same individuals used twice
controls for individual differences
higher statistical power with small samples
but, there are carryover/practice effects and this method is not always possible
paired sample t-test: NHST approach
propose null hypothesis
now we have two conditions, so we can use difference scores between the conditions
construct null hypothesis distribution and calc descriptive stats of the difference scores
compute t value and make a decision
common features of pair-sample t tests
same participants
two measurements are meaningfully linked (before vs after, often within-subject design, etc.)
tests whether the mean difference between paired observations is significantly different than 0
assumptions of paired sample t test
pairs are meaningfully matched (but still independent)
differences between pairs are approximately normally distributed
paired sample vs one sample t tests
basically the same!
the same process is done, except with paired sample the distribution is the differences between 2 conditions
goal of t tests
to help us make a decision about the null hypothesis
t statistic theoretically
t observed = effect/ variability of the effect
the effect would be Xbar - Mu, how far the sample mean deviates from the null
the variability of the effect is the SEM, variability expected by chance
the two critical values are used to make the critical region (1 - α /2, df)
used to determine statistical significance
steps of t-statistics
compute the t-statistic (effect/variability of the effect)
determine critical t value (with α and df)
compare
if the absolute value of our t statistic is greater than the critical t values, we reject H0
independent sample t test
tests whether independent groups differ significantly from eachother
works great for naturally distinct groups
requires larger sample size
more variability from individual differences
often cross-cultural, cross-sectional, gender differences, clinical vs non-clinical populations, teaching method comparisons, etc.
assumptions of independent sample t test
groups are independent
each data are roughly normal
equal variance across groups!!!
biggest differences in calculating independent t tests compared to the others
in the t statistic, the observed diff and the SEM must be calculated differently
in the critical t value, the degrees of freedom must be calculated differently
calculating t statistic for independent sample t tests
observed diff/SEM
for the observed diff, we just minus one effect (xbar1) from the other (xbar2)
the SEM is more complicated, as in using two independent samples it is hard to find a connecting of variance between them
SEM in t stat calculation for independent sample t tests: 2 ways
variance sum law
pooled variance
variance sum law
if two variables are independent, the variance of their sum (or difference) equals the sum of their variances
so, we literally calculate each SEM separately, and just add them
pooled variance
instead of estimating two variances, we combine them into a single pooled estimate
this one is more accepted
every sample will have its own amount of variability, and sampling error will always be included in the calculation of variance
the solution is to average the two sample variances
by dividing two samples, we can get rid of sampling error
this also takes into account the size of both samples
what is sampling error
the difference between a sample statistic and the true population parameter
arises because the sample is only an approximation of the entire population
can be decreased with an increased sample size
SEM calculation for independent sample t tests
square root of the pooled variance
benefits of pooled variance
more accurate estimate
higher statistical power
historical and computational simplicity
connection to ANOVA
df in independent sample t test
df = n1 + n2 - 2
when finding our critical t value, our df changes
it becomes bigger since we have two sample sizes
most important assumptions with independent sample t test
equal variance across groups
2 ways we can test if there is equal variance across groups
levene’s test - a statistical method to check if variances are equal across two or more groups, used in ANOVA too
common rule of thumb - check if the ratio of the larger sample : smaller sample is less than 4
or, less than 2 if looking at sample standard deviations
more risky and less conservative
consequences if the equal variance assumption isn’t met
type 1 error increases
in some situations, it decreases too (if the larger n has a larger 𝜎), making it almost impossible to find an effect
if smaller n has a larger 𝜎, there is 1/3 a chance of a type 1 error
measuring effect in 3 t tests
one-sample: Xbar - Mu
paired: Xbar1 - Xbar2
independent: Xbar1 - Xbar2
SEM in the 3 t tests
one-sample: s/√n
paired: sdifferences/√npair
independent: √(pooled variance)
df in the 3 t tests
one sample: n-1
paired: npair - 1
independent: (n1 - 1) + (n2 -1)
what is calculated the same in all 3 t tests?
t critical value and confidence interval
what does n represent in paired sample t tests
the number of paired/grouped samples, not just individual samples
for ex, participant 1’s before and after are seen as 1, not split into 2
solution for when the two samples’ variances don’t match
using welch’s t test
changes how SEM and df are calculated
welch’s t test
corrects df and SEM to allow for a more conservative estimate
as variance ratio grows and n1 shrinks, df shrinks
as sample size becomes increasingly unequal, the df will increase
ideally, we want a ratio of 1
welch’s t test allows for type 1 error rate to stay around 0.05, no matter the inequality of variance
is increasing sample size to get a statistical P (0.05) p-hacking?
no, because we aren’t lying
how n impacts test statistic and p values
bigger n → test statistic grows
big n → even big effects can yield small p-values (statistical significance)
small n → moderate effects may fail to reach significance
P values do not tell us how big real effects are
raw effect size in relation to t statistic
the numerator
effect size characteristics
magnitude - quantifies how big an effect is
independent from sample size - makes it a better measure of practical importance than the p-value (which is heavily influenced by number of observations, n)
compliments statistical significance - report alongside statistical significance tests, to give the complete picture of the research outcome
cohen’s d
effect size measure in t tests
the standardized difference between two means (divided by sd)
0.2 (small)
0.5 (medium)
0.8 (large)