stats and research methods final

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/238

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

239 Terms

New cards

why do we need descriptive stats

to interpret results
facilitates predictions from patterns
describes information (communication)
real world events are probalistic (based on probability)

New cards

variability is a _____, not an exception

rule

we use distributions to map out different outcomes happen, and use these to determine probability

New cards

how do we get probability curves?

measure samples from the same population, and use that data to create samples

many things happen in a normal distribution, and we can use that to make predictions

New cards

central tendency

a center, representative value

median (middle number)
mean(average)
mode(recurring)

New cards

central tendencies on a slanted curve

mode at high end, median in middle, mean at bottom

New cards

variance

how spread out and different the data is

New cards

sd is used to represent

variance

New cards

how does a sample size impact variance

big sample size → less variance

New cards

ecological validity and sd

high ecological validity lowers SD (less variance)

New cards

bivariate stats

the relation between continuous or categorical variables

New cards

pearson’s coefficient

describes the direction and strength of a relationship between variables, offers a shortcut to describing data

New cards

issues with descriptive stats

interpretation can be complicated
- data can change but have the same range and central tendency
- in these situations, box plots don’t work anymore (violin plots better)
so, we need something else

New cards

why do we normalize data

it is often challenging and/or misleading to use raw data to make predictions

for ex, hard to find where the 95% prob cut off lines are

New cards

normalizing data

transferring data (through linear relations) to have a mean of 0 and SD of 1 (x-axis)
easier to communicate info
- cutoff for 95% confidence is [-1.96,1.96]

New cards

goal in understanding data

we aim to understand the population, but often population parameters are unknown/unknowable

so, we collect samples and use them to make educated inferences about the population

New cards

how do we normally estimate>

we estimate within a range, rather than pinning down a single value

inferential stats predicts a range of an interval, that we hope (to a high probability) is correct

New cards

confidence interval - theoretically

we are finding a range that is very likely to contain the true population parameter (often the mean)

uses a sample statistic (sample mean) as a point estimate
incorporates sampling variability (through SEM and a critical value [often a t-critical value])

New cards

how we build our confidence interval

use sample stats:

sample mean (assuming its close to Mu)
standard error of the mean (from sample variance and sample size)
use on our sampling distribution of sample means

Mu ± 1.96 (z critical value) * ( σ/√(n) )

New cards

central limit theorem

the shape of the sampling distribution of sample means

with a sample size big enough, the distribution is approximately normal
- even if the population distribution is non-normal

New cards

parameters of central limit theorem

mean = Mu

standard deviation = σ/√(n) = SEM

New cards

Standard Error of the Mean

The SEM quantifies the typical distance a sample mean is likely to be from the actual population mean

a smaller SEM indicates a more precise estimate of the population mean

New cards

how can we calculate CI with only sample statistics?

Law of Large Numbers - as sample size increases, sample parameters will get closer to population parameters

so, the sample mean (x-bar) will become closer to the population mean (Mu)
we use the standard deviation from a single sample (s) to estimate population standard deviation (σ)
- we have to do this, since we often don’t know the parameters of the population

New cards

symbols and what they mean

x-bar - sample mean

Mu - population mean

σ - population SD

s - sample SD

n - sample total

New cards

in CI calculations, when would we replace the z-critical values with t-based critical values?

when n is small

x-bar ± t _{critical values} * SEM: σ/√(n)

New cards

as n increases, what happens to range of sampling distribution of sample means

as n increases → range decreases → CI range decreases

this means our statistics are becoming more accurate

New cards

how does n size impact SEM

n increases → range becomes more narrow → SEM gets smaller

New cards

degrees of freedom

how many independent pieces of info (not relying on info received elsewhere or estimated) we have available to estimate a parameter

so, whenever we estimate a parameter, we are “using up” some of that freedom

New cards

when do we divide by n and when by n-1?

if we are describing the variability within a sample only, we may divide by n

ex, if we are using the sample sd as a descriptive statistic

if we are making inferences about a population, we need to divide by n-1

ex, in our sample sd stat we need to divide by n-1

New cards

sample mean vs variance calulation

sample mean - divide by n because it is independent

variance/sd - divide by n-1, because it is dependent on sample mean

New cards

null hypothesis testing

H₀= baseline model saying “there is no effect, no difference, or no relationship in the population”

we start with this, to test if a prediction is statistically possible/meaningful

H_A/H₁ = alternative hypothesis saying there is an effect, and a mean difference doesn’t equal zero

New cards

example of importance of alternate hypothesis

alt hypothesis determines what data you calculate

if you alternate only focuses on a specific direction (A is better than B), your confidence inference will only look at one end of the date
if your alternate is more general (there is a statistical difference between A and B), your confidence inference will consider both sides of data
- important to do this to not limit yourself

New cards

the 2 rules to making decisions in stats

type 1 error

type 2 error

New cards

type 1 error

rejecting H₀ when it is actually true

saying there is an effect when there isn’t one
a false alarm

New cards

type 2 error

accepting H₀, when it is not true

saying there is no effect when there is one
a miss

New cards

type 1 error details

alpha set before collecting data (often 0.05, so 5%)

with alpha = 0.05, you will make a type 1 error about 5% of the time

based along the null curve

New cards

what is alpha? (type 1 error)

the significance level, used as a cutoff for deciding when to reject/accept H₀

used to carve the rejection region

New cards

alpha used on a null curve

critical values = cut off null curve to alpha/2 is on the outside of the cutoffs

if the sample data falls outside of these critical values, we can reject null (H₀)

New cards

type 2 error details

based on the alternate hypothesis curve

β is something we measure, dependent on:

population size
sample size
significant level
variance

if sample data falls inside the β, we can accept null

New cards

type 2 error (β) in relation to other parameters

bigger effect size → smaller β

larger n → shrinks standard error → smaller β

significant level (alpha) → stricter thresholds → rejecting H₀ is harder → increases β

variance (σ²) → harder to detect effects → increases β

New cards

statistical power

the probability of correctly rejecting H₀ when it is false (correctly determining an effect)

power = 1 - β

we want a small β
before conducting research, the researchers estimate the needed sample size to achieve acceptable power (80%, so β=.20)

AKA the sensitivity of a study

New cards

as sd decreases, what happens to β and alpha

their overlapping areas become smaller

this helps us minimize errors, to decrease both β and alpha

New cards

p-values

the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true

quantifies to improbability of this data being generated under the null
does NOT tell us the probability of null being true

New cards

low vs high p-value

a low p-value suggests that the observed data is unlikely if the null hypothesis is correct, providing strong evidence for the alternative hypothesis

New cards

ronald a fisher (1920-30s)

introduced levels of significance as convenient thresholds for interpreting whether data provide evidence against H₀.

New cards

Neyman and Pearson (1930s)

developed hypothesis testing framework for Type I error and Type II error
- Their work put the p-value into a decision making context

New cards

CI and P value: two sides of the same coin

CI uses sample distribution, centered on sample mean

asks if H₀ falls inside my plausible range

P-value uses null hypothesis as the distribution, centered on 0.5 (50%)

asks how much variance can be random
asks how extreme my data is if H₀ were true

New cards

data that is seemingly not critical can be made critical if we have…

a big enough sample size

New cards

problem with relying on P value

does not tell us how big an effect is

even if we have calculated a statistical significance, it may be tiny IRL

we can get around this by calculating …..

New cards

effect size

the magnitude of an effect/strength of an association

standardized mean differences
independent of sample size
scale free, allows for comparison across studies

New cards

2 statistical approaches to evaluate an inference about a population

null hypothesis significance testing (NHST)

confidence intervals (CI)

should give us the same conclusions

New cards

NHST evaluations

start with null

compute a test statistic relative to the null

compare critical values ( or p-values )

decide whether to reject null or not

binary outcome: reject vs not reject

New cards

CI in making evaluations

construct an interval around the sample estimate

if the interval includes the null value, then that implies significance at the corresponding alpha

CI also shows plausible effect sizes, not just a decision

New cards

when do we rely on the t distribution

in CI calculations, we often can’t use z _{critical values} because we don’t know population sd

z _{critical values} are rarely used for inference

New cards

df calculation (in general)

n - 1

big sample size → big df (good!!!!)

New cards

t distribution

a family of distributions

looks like the standard normal (bell shaped, centered at 0)
has heavier tails to account for extra uncertainty from estimating σ
as n increases, t approaches the normal standard z
takes into account degrees of freedom (bigger df → close to normal)

New cards

bigger df leads to what (in t distributions)

big df → means a big sample → thinner curve → thinner confidence intervals

New cards

rule of thumb for df in t distributions

the benefits of an increased sample size asymptote at n=30 (df=29)

New cards

3 types of t tests

one sample t-test → tests if a sample differs from a known/hypothesized population mean

paired sample t-test → tests if means differ across two related measurements (e.g. before vs after)

independent two-sample t test → tests if two independent group means differ

New cards

one sample t test

determines whether mean of single sample differs significantly from specified population mean/hypothesized value

evaluates null hypothesis that the sample comes from, under assumption of approximate normality and independent observations

is the average score in one group far enough from a comparison value that chance alone is unlikely to explain it???

New cards

one sample t-test: CI approach

propose a null hypothesis
collect a sample
compute descriptive statistics (x-bar, s, n, df, SEM)
calculate CI based on df (quantile on t-distribution with df = n-1 at probability 1 - α/2

calculation based on sample distribution

make decision (if CI includes null, fail to reject null)

New cards

one sample t-test: NHST approach

propose null
collect data
construct null hypothesis distribution that explains probability by CHANCE, and calculate descriptive stats
compute t value, and make a decision

New cards

when to reject with t values

reject H₀ if | t_obs | > t_{1 - α/2, df}

we are comparing our observed t-value to the critical one

New cards

t test tables

gives us our critical t value

a table of df by alpha
meet in the middle to find our critical value
lets us compare the absolute value of our observed t-value to the critical
- if our calculated t-value falls within the critical region, 95% by chance

New cards

one sample t-test: two tailed

alt hypothesis predicts a difference but not direction, critical region is split on BOTH ends of the distribution

α = 0.025 (bc α/2 on either side)
more conservative and requires a larger effect to reject null (since α is divided across both tails)

New cards

one sample t-test: one-tailed test

alt hypothesis predicts a specific direction, so critical region is entirely on only one side of the distribution

α = 0.05 on one side
more statistical power for detecting an effect but blind to effects in opposite direction
- not ideal in the research world

New cards

assumptions of one sample t test

scale of measurement - dependent variable is continuous (interval or ratio scale)

normality - CLT for large samples, raw close to normal for small samples

independence - each obs is independent of the others (no repeated or paired)

population variance unknown

New cards

paired sample t test

often same individuals used twice
controls for individual differences
higher statistical power with small samples
but, there are carryover/practice effects and this method is not always possible

New cards

paired sample t-test: NHST approach

propose null hypothesis
now we have two conditions, so we can use difference scores between the conditions
construct null hypothesis distribution and calc descriptive stats of the difference scores
compute t value and make a decision

New cards

common features of pair-sample t tests

same participants
two measurements are meaningfully linked (before vs after, often within-subject design, etc.)
tests whether the mean difference between paired observations is significantly different than 0

New cards

assumptions of paired sample t test

pairs are meaningfully matched (but still independent)

differences between pairs are approximately normally distributed

New cards

paired sample vs one sample t tests

basically the same!

the same process is done, except with paired sample the distribution is the differences between 2 conditions

New cards

goal of t tests

to help us make a decision about the null hypothesis

New cards

t statistic theoretically

t _observed = effect/ variability of the effect

the effect would be Xbar - Mu, how far the sample mean deviates from the null
the variability of the effect is the SEM, variability expected by chance
the two critical values are used to make the critical region (1 - α /2, df)
used to determine statistical significance

New cards

steps of t-statistics

compute the t-statistic (effect/variability of the effect)
determine critical t value (with α and df)
compare

if the absolute value of our t statistic is greater than the critical t values, we reject H₀

New cards

independent sample t test

tests whether independent groups differ significantly from eachother

works great for naturally distinct groups
requires larger sample size
more variability from individual differences
often cross-cultural, cross-sectional, gender differences, clinical vs non-clinical populations, teaching method comparisons, etc.

New cards

assumptions of independent sample t test

groups are independent

each data are roughly normal

equal variance across groups!!!

New cards

biggest differences in calculating independent t tests compared to the others

in the t statistic, the observed diff and the SEM must be calculated differently

in the critical t value, the degrees of freedom must be calculated differently

New cards

calculating t statistic for independent sample t tests

observed diff/SEM

for the observed diff, we just minus one effect (xbar₁) from the other (xbar₂)
the SEM is more complicated, as in using two independent samples it is hard to find a connecting of variance between them

New cards

SEM in t stat calculation for independent sample t tests: 2 ways

variance sum law

pooled variance

New cards

variance sum law

if two variables are independent, the variance of their sum (or difference) equals the sum of their variances

so, we literally calculate each SEM separately, and just add them

New cards

pooled variance

instead of estimating two variances, we combine them into a single pooled estimate

this one is more accepted
every sample will have its own amount of variability, and sampling error will always be included in the calculation of variance
- the solution is to average the two sample variances
by dividing two samples, we can get rid of sampling error
this also takes into account the size of both samples

New cards

what is sampling error

the difference between a sample statistic and the true population parameter

arises because the sample is only an approximation of the entire population
can be decreased with an increased sample size

New cards

SEM calculation for independent sample t tests

square root of the pooled variance

New cards

benefits of pooled variance

more accurate estimate

higher statistical power

historical and computational simplicity

connection to ANOVA

New cards

df in independent sample t test

df = n₁ + n₂ - 2

when finding our critical t value, our df changes

it becomes bigger since we have two sample sizes

New cards

most important assumptions with independent sample t test

equal variance across groups

New cards

2 ways we can test if there is equal variance across groups

levene’s test - a statistical method to check if variances are equal across two or more groups, used in ANOVA too

common rule of thumb - check if the ratio of the larger sample : smaller sample is less than 4

or, less than 2 if looking at sample standard deviations
more risky and less conservative

New cards

consequences if the equal variance assumption isn’t met

type 1 error increases

in some situations, it decreases too (if the larger n has a larger 𝜎), making it almost impossible to find an effect
if smaller n has a larger 𝜎, there is 1/3 a chance of a type 1 error

New cards

measuring effect in 3 t tests

one-sample: Xbar - Mu

paired: Xbar₁ - Xbar₂

independent: Xbar₁ - Xbar₂

New cards

SEM in the 3 t tests

one-sample: s/√n

paired: s_differences/√n_pair

independent: √(pooled variance)

New cards

df in the 3 t tests

one sample: n-1

paired: n_pair - 1

independent: (n₁ - 1) + (n₂ -1)

New cards

what is calculated the same in all 3 t tests?

t critical value and confidence interval

New cards

what does n represent in paired sample t tests

the number of paired/grouped samples, not just individual samples

for ex, participant 1’s before and after are seen as 1, not split into 2

New cards

solution for when the two samples’ variances don’t match

using welch’s t test

changes how SEM and df are calculated

New cards

welch’s t test

corrects df and SEM to allow for a more conservative estimate

as variance ratio grows and n₁ shrinks, df shrinks
as sample size becomes increasingly unequal, the df will increase
ideally, we want a ratio of 1

welch’s t test allows for type 1 error rate to stay around 0.05, no matter the inequality of variance

New cards

is increasing sample size to get a statistical P (0.05) p-hacking?

no, because we aren’t lying

New cards

how n impacts test statistic and p values

bigger n → test statistic grows

big n → even big effects can yield small p-values (statistical significance)

small n → moderate effects may fail to reach significance

P values do not tell us how big real effects are

New cards

raw effect size in relation to t statistic

the numerator

New cards

effect size characteristics

magnitude - quantifies how big an effect is

independent from sample size - makes it a better measure of practical importance than the p-value (which is heavily influenced by number of observations, n)

compliments statistical significance - report alongside statistical significance tests, to give the complete picture of the research outcome

100

New cards

cohen’s d

effect size measure in t tests

the standardized difference between two means (divided by sd)

0.2 (small)
0.5 (medium)
0.8 (large)