BILD 5 FINAL EXAM

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/37

There's no tags or description

Looks like no tags are added yet.

Last updated 7:54 PM on 3/13/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

38 Terms

New cards

P-Value vs Effect Size

P-values indicate whether a difference is statistically significant, but don’t describe how large that difference is

don’t capture whether difference is biologically meaningful

Effect size: quantity that captures magnitude of the effect

significant result could have tiny effect size
non-significant result could have large effect size

New cards

Type I vs Type II errors

Type I: false positive, erroneously conclude there is a difference when in reality, none exists

reject the null when null is actually true

Type II: false negative, fail to detect a difference when one does exist

fail to reject the null when null is false

New cards

Alpha and Type I error

The significance level (alpha) sets rejection fence:

when alpha = 0.05, we only reject the null in most extreme 5% of t-scores
when alpha = 0.01, we only reject null in most extreme 1% of t-scores

Lower alpha → stricter standard for rejecting the null

as we lower alpha, more cautious about rejecting null; reduces likelihood of type I error

Type I error rate = alpha!

alpha = probability of incorrectly rejecting the null

New cards

Trade-off between Type I and Type II error

Lowering alpha:

Good: more cautious about rejecting null → reduced type I error rate
Bad: might end up missing effect that DOES exist → increased type II error rate

With lower alpha, becomes harder to reject the null

less likely to reject null by mistake (reduce type I error)
makes it harder to reject null when you should (increase type II error)

Attempting to reduce one will increase the other, therefore error can never = 0; type I and type II errors are at odds with each other

New cards

Type II error and power + power analysis

Type II error (ß): if alternative is true, what’s the probability we miss the effect?

Power: if alternative is true, what’s the probability we detect the effect?

we want high power
study with low probability of type II error = high statistical power
- power is opposite of type II error

Common target for statistical power is 80%:

if alternative is true (an effect DOES exist), we correctly detect it 80% of the time

Before conducting experiment, researchers perform power analysis:

before you start, you usually have some idea of expected effect size and SD
“what sample size do I need to achieve a desired power?”

New cards

Factors affecting statistical power

Sample size

higher the sample size, higher the power (directly proportional)
more likely we are to detect a difference if one exists

Significance level

higher the significance level, higher the power
if we increase alpha, don’t need as extreme of a t-score to reject the null
if alternative is true (truly is a difference), it is easier to discover it
- Caveat: increasing alpha increases type I error

Effect size

higher the effect size, higher the power
easier to detect big difference compared to small difference

Standard deviation

higher the standard deviation, lower the power
- large standard deviation: hard to detect an effect if it does exist
- small standard deviation: easier to detect difference

New cards

The problem of multiple comparisons + Bonferroni correction

The more t-tests you do, the greater the chance of making at least one false positive

if significance level = 5%, each individual test has 5% chance of being a false positive (type I error)

Bonferroni Correction: divide alpha (ie 5%) by number of comparisons to determine a Bonferroni-corrected alpha

ex. if you perform 10 t-tests, new alpha threshold is 0.05/10 = 0.005
- for any individual comparison to be significant, p-value must be < 0.005

New cards

P-hacking and Preregistration

P-hacking: repeatedly re-analyzing your data until you find a statistically significant result

undermines reproducibility of scientific research
perform more comparisons = significantly increase chance of at least one type I error (false positive)

Preregistration: specifying your study design, hypotheses, and analysis plans before collecting data

holds yourself accountable to original research objectives, reduces temptation to p-hack
avoids manipulating data to fit the narrative you want

New cards

Analysis of Variance (ANOVA)

Compare all the means simultaneously in one test: (compares 3+ groups)

Null: the mean is the SAME across all three groups
Alternative: the mean is NOT the same
- at LEAST one group is different from the rest

To perform an ANOVA, R calculates:

an F statistic
a p-value, which tells us if F statistic large enough to be statistically significant
- large F statistic = big deviation from null = low p-value

New cards

Variance and the F-statistic

Variance within each group: members of each group will differ due to random variation

Variance between groups: how different are sample means from each other?

F statistic = variance between groups / variance within groups

if F-statistic much greater than 1:
- reject the null!
- variance between groups are high and within groups are smaller; within each group, people are relatively similar
larger the F statistic, more likely we are to reject the null
smaller F statistic suggests our data is more consistent with the null

New cards

Analysis of Variance, R outputs

If p-value is 1.99e^-12:

probability of getting an F-statistic as large as ours, IF NULL IS TRUE
since p-value is < 0.05, we reject the null: at least one group is statistically different from the others

However, ANOVA by itself doesn’t tell you which group(s) is/are different

New cards

Tukey’s post-hoc Test and Interpretation

Post-hoc test: statistical test performed after initial test (ie, ANOVA) to find which pairs of groups are significantly different

Tukey’s post-hoc test: most common after ANOVA, if you reject the null

first, run ANOVA; if reject null:
run post-hoc test
- TukeyHSD(result)
- HSD stands for honestly significant difference

If p-value between groups are < 0.05, there is a significant difference!

draw significance bars to indicate statistical significance between groups

<p>Post-hoc test: statistical test performed after initial test (ie, ANOVA) to find which pairs of groups are significantly different </p><p>Tukey’s post-hoc test: most common after ANOVA, if you reject the null</p><ul><li><p>first, run ANOVA; if reject null: </p></li><li><p>run post-hoc test </p><ul><li><p>TukeyHSD(result)</p></li><li><p>HSD stands for honestly significant difference</p></li></ul></li></ul><p>If p-value between groups are < 0.05, there is a significant difference!</p><ul><li><p>draw significance bars to indicate statistical significance between groups</p></li></ul><p></p>

New cards

Verifying normality in ANOVA

For ANOVA, we try to verify the data meet the assumptions using AS FEW TESTS as needed

within each group, determine the residuals
- Residuals: difference between each data point and its group mean
- analyzing residuals can show non-normality

Steps:

1) use aov function and save output to object
2) extract residuals as vector
3) make plot of residuals

<p>For ANOVA, we try to verify the data meet the assumptions using AS FEW TESTS as needed</p><ul><li><p>within each group, determine the residuals</p><ul><li><p>Residuals: difference between each data point and its group mean</p></li><li><p>analyzing residuals can show non-normality</p></li></ul></li></ul><p>Steps: </p><ul><li><p>1) use aov function and save output to object</p></li><li><p>2) extract residuals as vector</p></li><li><p>3) make plot of residuals</p></li></ul><p></p>

New cards

Non-parametric tests

Sometimes, your data may never meet the assumptions of a t-test/ANOVA (parametric test) regardless of transforming

can use non-parametric tests, which do not assume normality
- tests other attributes of data, ie median, rank order
- generally have lower power

Paired t-test → Wilcoxon signed rank test

2-sample t-test → Mann-Whitney U test

ANOVA with Tukey’s post-hoc test → Kruskal-Wallis test with Dunn’s post-hoc test

New cards

Covariance

After finding (x - mean) (y - mean) for every data point, we calculate average

more points in positive quadrants, positive covariance = positive association
more points in negative quadrants, negative covariance = negative association
equally many points in both positive and negative quadrants = zero association

Only sign of covariance is meaningful:

same data, different units → different covariance
covariance changes even though underlying relationship between variables did not change

New cards

Pearson correlation coefficient

Normalize covariance by using r, called Pearson correlation coefficient

is not sensitive to units
r tells us if two variables rise and/or fall together

Does NOT imply a causal relationship between two variables!

r only tells us whether two variables rise and/or fall together

<p>Normalize covariance by using <em>r</em>, called Pearson correlation coefficient</p><ul><li><p>is not sensitive to units</p></li><li><p><em>r</em> tells us if two variables rise and/or fall together</p></li></ul><p>Does NOT imply a causal relationship between two variables!</p><ul><li><p><em>r</em> only tells us whether two variables rise and/or fall together </p></li></ul><p></p>

New cards

Correlation vs Regression

Correlation: goal is to quantify the association between two numerical variables

NOT trying to fit a line to the data
JUST seeing if two variables rise/fall together

Regression: goal is to predict the value of one variable from another

clearly defined independent and dependent variable
we ARE trying to fit a straight line to the data

New cards

Linear model/regression model

Goal: predict the dependent variable from the independent variable

draws straight line through the data, creates equation y = mx + b

Sample statistics (b1 and b0): estimate the population parameters (ß1 and ß0)

class 20

New cards

Least Squares Approach

Residual: difference between each y-value and model’s prediction

across all data points, calculate sum of squared residuals
better the model fits our data, the smaller the sum of squared residuals
- y-value should be closer to model’s prediction

Best fit line (aka regression line): one that minimizes sum of squared residuals

say this line was obtained using the “least squares approach”

New cards

Slope of the regression line

Slope is the biology!

ex. plant mass = 0.92 x nitrogen levels + 10.7
for every 1 ppm increase in nitrogen, plant mass increases by 0.92g

Y-intercept has no biological significance

you should avoid extrapolating too far beyond the data

The better our model, the lower SS(model) is compared to SS(mean)

SS measures error of the model
Lower SS(model) → our model’s predictions are closer to the actual data

class 20

New cards

R² and its meaning

R²: proportion of the total variance of y that is explained by x

The sum of square went down from 16.05 → 8.58, 47% decrease!

there is a 47% decrease in error when we take nitrogen levels into account
47% of the total variance in plant mass is explained by the soil nitrogen content

SS(mean) is often called SS(total)

represents the total variability in the y variable

SS(model) is often called SS(residuals)

represents the residuals of the final model

<p>R²: proportion of the total variance of y that is explained by x</p><p>The sum of square went down from 16.05 → 8.58, 47% decrease!</p><ul><li><p>there is a 47% decrease in error when we take nitrogen levels into account </p></li><li><p>47% of the total variance in plant mass is explained by the soil nitrogen content </p></li></ul><p>SS(mean) is often called SS(total)</p><ul><li><p>represents the total variability in the y variable</p></li></ul><p>SS(model) is often called SS(residuals)</p><ul><li><p>represents the residuals of the final model</p></li></ul><p></p>

New cards

Null distribution of b1 values + comparing t-score with the null

theoretical distribution of all possible b1 values IF THE NULL IS TRUE

due to random sampling, even when the null is true, the slope of our sample (b1) won’t be exactly zero
the larger the magnitude of the t-score, the more likely we are to reject the null

Comparing our t-score with the null

if null is true, we expect the t-score to be close to zero
p-value: probability of getting OUR slope / OUR t-score or a more extreme value of the null is true
if p < alpha, REJECT the null

<p>theoretical distribution of all possible b1 values IF THE NULL IS TRUE </p><ul><li><p>due to random sampling, even when the null is true, the slope of our sample (b1) won’t be exactly zero</p></li><li><p>the larger the magnitude of the t-score, the more likely we are to reject the null </p></li></ul><p>Comparing our t-score with the null </p><ul><li><p> if null is true, we expect the t-score to be close to zero</p></li><li><p>p-value: probability of getting OUR slope / OUR t-score or a more extreme value of the null is true</p></li><li><p>if p < alpha, REJECT the null </p></li></ul><p></p>

New cards

Interpreting linear regression in R

Only care about the slope!

Since p-value of bite slope (11.677) is 0.0393:

We reject the null, there is a relationship that exists between the variables
IF THE NULL WERE TRUE, we have a 0.039 chance of getting a t-score (or a slope) as extreme as ours
p < alpha

New cards

Overfitting

when model tries TOO hard to match the data

even if the model perfectly describes the data we have, it will perform poorly on new, unseen data
we never expect R² to be 1
if you draw a squiggly line, unlikely that it is the true relationship

New cards

Comparing Residuals

Residuals: difference between each measurement and model

when relationship is truly linear, some points are above and some are below the regression line
- there is no pattern to the residuals; are normally distributed
when relationship is non-linear, there is a pattern to the residuals
- residuals are NOT normal

New cards

Transforming residuals for regression

1) Plot the residuals and run a KS test on the residuals

if normal, linear regression is valid
if non-normal:

2) make a histogram of the x and y variables

is one variable clearly skewed?
is one variable mostly normal but has a clear outlier?
- transform skewed variable or remove the outlier

3) run a KS test on the new model residuals

New cards

Chi-squared goodness-of-fit test

Used to perform hypothesis tests on categorical data:

Step 1: calculate a chi-squared (x²) test statistic

Step 2: predict the null distribution of x²

Step 3: compare our value of x² to the null distribution

Basically, compare observed frequency of category to expected frequency of category

<p>Used to perform hypothesis tests on categorical data: </p><p>Step 1: calculate a chi-squared (x²) test statistic</p><p>Step 2: predict the null distribution of x²</p><p>Step 3: compare our value of x² to the null distribution</p><p>Basically, compare observed frequency of category to expected frequency of category</p><p></p>

New cards

Statistical Tests Summary

One sample t-test: 1 numerical

Paired t-test: two categories; 1 numerical, 1 categorical

Two-sample t-test: two categories; 1 numerical, 1 categorical

ANOVA: 3+ categories; 1 numerical, 1 categorical

Linear regression: 2 numerical

Chi-squared goodness of fit: 1 categorical

Correlation is also appropriate for quantifying association – it does NOT attempt to predict Y from X

New cards

Manipulative vs Natural Experiments

Manipulative: experimenter actively controls the independent variable

more directly establish causality between independent and dependent variabels
however, might not be practically and/or ethically feasible

Natural/Observational: experimenter relies on pre-existing differences in independent variable

challenging to implement control group, randomly assigning subjects to treatment vs control groups, and blinding

New cards

Random vs Systematic Error

Random: fluctuations in our data that occur just by chance

ie natural variation in blood pressure between individuals

Systematic: error that consistently skews our data in one direction

ie meditation group starts to eat more healthily and exercise more
includes confounding variables, non-random assignment, experimental bias

New cards

Minimizing systematic error

Control groups: isolate the effect of the treatment (IV) on DV

to minimize confounding variables, keep control group maximally similar to treatment group

Random assignment: each experimental subject should have equal chance of being assigned to treatment or control group

Blinding: if participants or researcher are aware of treatment, may consciously or subconsciously influence their behavior

Single-blind: participants unaware if they have been assigned to treatment or control
Double-blind: BOTH participants AND researchers are unaware who is assigned to treatment or control

New cards

Sample Size and Random Error

If we only give each cream to one person, exposes us to LOTS of random error

any observed difference could be significantly affected by natural variability between the two people

When you increase sample size, you give the skin creams to more people

increasing sample size reduces the influence of random error
- averages out random variation across more people
any one individual now exerts a weaker pull on the overall mean

When increasing sample size, standard error of the mean decreases!

smaller n → larger SE
larger n → smaller SE
- standard error is a measure of the random error!

New cards

Experimental Unit + Between-subjects vs. within-subjects

An individual or group that is assigned a treatment independently of every other unit

Between-subjects / between-groups: each experimental unit experiences EITHER one condition OR the other
- unpaired
- independent groups design
Within-subjects / within-groups: each experimental unit experiences BOTH conditions
- paired
- repeated measures design

New cards

Sample size and replicates (biological vs technical)

Sample Size: total number of experimental units across entire experiment

Biological replicates: number of experimental units that experience each condition

captures biological variability

Technical replicates: number of measurements you take per experimental unit

increases precision of measurement
does NOT capture any additional biological variation

<p>Sample Size: total number of experimental units across entire experiment</p><p>Biological replicates: number of experimental units that experience each condition</p><ul><li><p>captures biological variability</p></li></ul><p>Technical replicates: number of measurements you take per experimental unit</p><ul><li><p>increases precision of measurement</p></li><li><p>does NOT capture any additional biological variation</p></li></ul><p></p>

New cards

Pseudoreplication and how to avoid it

Pseudoreplication: when you erroneously report the sample size as being higher than it actually is

ex. if you take 4 measurements per plant, you think sample size = 6 plants x 4 measurements = 24
- not correct! 4 measurements are technical replicates

To avoid pseudoreplication:

Ask yourself: how do I increase sample size?
- add more plants!
  - adding more experimental units / biological replicates
- DO NOT take more measurements from each plant!
  - adding more technical replicants!
increasing sample size = capturing more biological variability

New cards

Measurement Validity

Does my experimental system actually address the research question that I’m asking?

To validate, you need additional control groups:

Negative control: checks if experimental system can detect a lack of change when we expect it to
- ie mice receive an injection of DMSO solvent WITHOUT a supplement
- establish baseline comparison WITHOUT any treatment
Positive control: checks if experimental system can detect a change when we expect it to
- ie mice receive an injection of a supplement KNOWN to raise blood insulin levels
- verifies our system CAN detect a postive result

For both positive and negative controls, we KNOW what to expect