BIOL 300 post MT 1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/121

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

122 Terms

1
New cards

what two main questions can we answer about a population mean (mu) using a sample

  1. estimation: what is the plausible range of values for mu? - this is answered with a confidence interval

  2. hypothesis testing: could mu be a specific, particular value? - this is answered with a hypothesis test

2
New cards

what is the sampling distribution of the mean (y-bar)?

It is the theoretical distribution of sample means (y-bar) that would be obtained from taking an infinite number of samples of the same size (n) from a population

3
New cards

what is the formula for the standard deviation of the sampling distribution of the mean 

sigma/sqrt(n)
where sigma is the true population standard deviation 

4
New cards

what is the standard error of the mean SE_y-bar?

It is the estimate of the standard deviation of the sampling distribution of the mean. It is calculated using the sample standard deviation (s) 

SE_Y-bar = s / sqrt(n) 

5
New cards

When is the sampling distribution of the mean (Y-bar) normally distributed? 

  1. When the variable (Y) is normally distributed in the population 

  2. When the sample size (n) is large, even if the population is not normally distributed 

6
New cards

What is the Central Limit Theorem?

The theorem states that the sum or mean of a large number of random measurements sampled from any population is approximately normally distributed

7
New cards

what is the correct interpretation of a 95% confidence interval? 

If we were to take many samples and calculate 95% CI for each one, 95% of those intervals would contain the true population mean (mu) 

8
New cards

what is the 2SE rule of thumb?

It is a rough approximation for a 95% confidence interval calculated as Y-bar ± 2SE_Y-bar 

9
New cards

If we know the population standard deviation (sigma), what is the formula for a 95% CI? 

Y-bar ± 1.96(sigma/sqrt(n))

10
New cards

When we don’t know the population standard deviation (sigma) what do we use instead?

We use the sample standard deviation (s) as an estimate 

11
New cards

What is the general formula for a confidence interval for the mean (mu) using the t-distribution? 

Y-bar ± t_a(2),df * (s/sqrt(n))

aka

Y-bar ± t_a(2),df * SE_Y-bar 

12
New cards

what is the t-statistic and why do we use it?

The t-statistic is t = (Y-bar - mu) / (s/sqrt(n))

  • we use it instead of the Z-statistic when we do not know the population standard deviation (sigma) and have to estimate it with the sample standard deviation (s). 

13
New cards

what distribution does the t-statistic follow? 

The students’s t distribution

14
New cards

how does the t-distribution compare to the Z-distribution (standard normal)?

The t-distribution has “fatter tails” (higher probabilities at the tails) to account for the extra uncertainty from using s to estimate sigma

15
New cards

what parameter defines the shape of a specific t-distribution?

The degrees of freedom (df)

16
New cards

How are degrees of freedom calculated for a one-sample confidence interval or t-test? 

df = n - 1, where n is the sample size 

17
New cards

In the notation t_a(2),df what does alpha represent for a confidence interval?

alpha = 1 - confidence level 

  • for 95% CI, alpha = 0.05 

  • for 99% CI, alpha = 0.01 

18
New cards

A sample of n = 8 snakes has Y-bar = 1.375 and s = 0.324. What are the degrees of freedom (df)?

df = n - 1 = 8 - 1 = 7

19
New cards

For n = 8, Y-bar = 1.375, s = 0.324. What is the Standard Error (SE_Y-bar)? 

SE_Y-bar = s / sqrt(n) = 0.324 / sqrt(8) = 0.115

20
New cards

what are the four steps of hypothesis testing?

  1. state hypotheses

  2. calculate test statistic 

  3. compare to critical value 

  4. draw conclusions 

21
New cards

what is the purpose of a one-sample t-test?

It compares the mean (Y-bar) of a random sample from a normal population to a hypothesized population mean mu_0

22
New cards

What are the null and alternative hypotheses for a one-sample t-test?

H_0: mu = mu_0 (the population mean is equal to the hypothesized value) 

H_A: mu ≠ mu_0 (the population mean is not equal to the hypothesized value) 

23
New cards

What is the formula for the one-sample t-test statistic?

t = (Y-bar - mu_0) / (s/sqrt(n)) or

t = (Y-bar - mu_0) / SE_Y-bar

24
New cards

what are the assumptions of one-sample t-test?

  1. The variable is normally distributed 

  2. The sample is a random sample 

25
New cards

We test H_0: mu = 98.6 degrees F. Our sample data is n = 24, Y-bar = 98,28, s = 0.940. Calculate the t-statistic. 

t = (Y-bar - mu_0) / (s / sqrt(n)) = (98.28 - 98.6) / (0.940 / sqrt(24)) = -1.67.

26
New cards

The critical value for alpha=0.05 and df=23 is +/- 2.07. Our t-statistic is -1.67. Do we reject H_0

No. The calculated t-statistic (-1.67) is not more extreme than the critical value (-2.07). It falls in the non-rejection region, so P > 0.05. We cannot reject the null hypothesis 

27
New cards


The df=129 and the critical value is +/- 1.98. Our t-statistic is -5.44. Do we reject H_0?

The calculated t-statistic (-5.44) is much more extreme than the critical value (-1.98). We reject the null hypothesis and conclude the mean body temperature is not 98.6 degrees F.

28
New cards

what does it mean for a statistic test to be robust?

A method is robust if the answer it gives is not sensitive to modest departures from its assumptions 

29
New cards

which t-test is considered robust?

A one-sample t-test (and by extension, the paired t-test)

30
New cards

Why is the central limit theorem important for robustness?

It is one of the main reasons many of our statistical tests (like t-tests) are considered robust, as they rely on the assumption of normality, which the CLT helps satisfy for sample means.

31
New cards

what is the main difference between a paired design and a 2-sample design?

In a paired design, each data point in one group has a direct, one-to-one correspondence with a data point in the other group. In a 2-sample design, the two groups are independent.

32
New cards

What is the benefit of a paired design?

It allows us to account for a lot of extraneous variation, because each member of a pair shares much in common (eg. the same person, same plot of land, same twin)

33
New cards

what are some examples of a paired design?

  • measuring something before and after a stimulus on the same object

  • applying a treatment to one arm and a placebo to the other arm of the same person 

  • using identical twins, with one getting a treatment and one not 

  • splitting a plot of land in half, fertilizing one side and not the other

  • comparing water quality “upstream” and “downstream” from the same power plant 

34
New cards

how does a paired t-test work?

It first calculates the difference (d) for each pair. Then, It performs a one-sample t-test on that single list of differences

35
New cards

What are the hypotheses for a paired t-test (when testing for any difference)?

H_0: The mean difference is zero (mu_d = 0)

H_A: The mean difference is not zero (mu_d ≠ 0)

36
New cards

what is the formula for the t-statistic in a paired t-test?

t = (d-bar - 0)/ (s_d / sqrt(n))

  • where “d-bar” is the mean of the differences, “s_d” is the standard deviation of the differences and “n” is the number of pairs 

37
New cards

How are degrees of freedom (df) calculated for a paired t-test?

df = n-1, where n is the number of pairs

38
New cards

what are the assumptions of a paired t-test? 

  1. The pairs are sampled independently and randomly 

  2. The differences (d) are normally distributed. (The individual measurements do not have to be) 

39
New cards

What is the goal of a two-sample (unpaired) t-test?

To compare the means of a numerical variable for two independent groups.

40
New cards

What is the parameter of interest in a two-sample test?

The difference between the two population means (mu_1 - mu_2)

41
New cards

what is our estimate the difference between population means?

The difference between our sample means (Y-bar_1 - Y-bar_2)

42
New cards

What is “pooled variance” (s_p²) and why is it used?

It is a weighted average of the variances from the two sample. We use it when we assume both populations have the same variance.

43
New cards

What is the formula for pooled variance (s_p²)?

s_p² = (df_1 * s_1² + df_2 * s_2²) / (df_1 + df_2)

where df_1 = n_1 - 1 and df_2 = n_2 - 1 

44
New cards

what is the formula for the standard error of the difference between two means?

SE = sqrt( s_p^2 / n_1 + s_p^2 / n_2 )

45
New cards

How are the degrees of freedom (df) calculated for a two-sample t-test?

df = df_1 + df_2 = (n_1 - 1) + (n_2 - 1) = n_1 + n_2 - 2

46
New cards

What are the hypotheses for a two-sample t-test (when testing for any difference)?

H_0: The population means are equal 

(mu_1 = mu_2), or (mu_1 - mu_2 = 0).

H_A: The population means are not equal

(mu_1 != mu_2), or (mu_1 - mu_2 != 0).

47
New cards

What is the formula for the two-sample t-test statistic?

t = ( (Y-bar_1 - Y-bar_2) - 0 ) / SE

  • where SE is the standard error of the difference 

48
New cards

What are the assumptions of a two-sample t-test? 

  1. both samples are random samples 

  2. both populations have normal distributions 

  3. the variance of both populations is equal

49
New cards

What is a common wrong way to compare the means of two groups?

Concluding that Group 1 is significantly different from a value, but Group2 is not, therefore Group 1 and Group 2 are different. This is a common logic error/

50
New cards

when visually comparing 95% CIs for two means, when can you be sure they are significantly different?

when the 95% CIs do not overlap at all.

51
New cards

When visually comparing 95% CIs or two means, when can you be sure they are NOT significantly different? 

When the 95% CI for one group overlaps the point estimate (the mean) of the other group 

52
New cards

When visually comparing the 95% CIs, what does it mean if the CIs overlap but do not cross each other’s point estimate? 

The difference in means is unknown. You must perform a formal hypothesis test (like a 2-sample t-test) to be sure.

53
New cards

What are the hypotheses for a test comparing the variances of two groups? 

H_0: sigma_1² = sigma_2² (The population variances are equal) 

H_A: sigma_1² ≠ sigma_2² (The population variances are not equal) 

54
New cards

What is Levene’s test used for?

It compares the variances of two or more groups.

55
New cards

In the R command leveneTest(data = D, Y ~ X, center = mean), which variable is numerical and which is categorical?

The convention is numerical ~ categorical. So, Y is the numerical variable and X is the categorical (group) variable.

56
New cards

An R output for Levene’s test shows a p-value (“Pr(>F)”) of 0.4897/ How do you interpret this?

The p-value (0.4897) is greater than 0.05. Therefore, we fail to reject the H_0 that the population variances are equal.

57
New cards

Why is Levene’s test preferred over the F-test for comparing variances? 

The F-test is not robust. It is very sensitive to its assumption that both distributions are normal, whereas Levene’s test is more reliable. 

58
New cards

What are the three main assumptions for t-tests?

  1. The sample(s) are random 

  2. The population are normally distributed 

  3. (for 2-sample t-tests) the populations have equal variances.

59
New cards

What are three ways to detect deviations from normality?

  1. Histograms

  2. Quantile plots (QQ plots) 

  3. The Shapiro-Wilk test 

60
New cards

How do you interpret a normal quantile plot (QQ plot)?

If the points fall on a straight line, it indicates the data fits the normal distribution. Points that curve away from the line indicate a lack of fit. 

61
New cards

On a normal QQ plot, what does a positively (right) skewed distribution look like?

The points curve up and away from the line at the high end. 

62
New cards

On a normal QQ plot, what does a negatively (left) skewed distribution look like?

The points curve down and away from the line at the low end.

63
New cards

On a normal QQ plot, what does a symmetric with fat tails distribution look like?

The points curve up at the high end and down at the low end (forming a slight S-shape) 

64
New cards

What is a Shapiro-Wilk test used for?

To test statistically whether a set of data comes from a normal distribution.

65
New cards

What are the H_0 and H_A for a Shapiro-Wilk test?

H_0: The data are from a normal distribution.

H_A: The data are not from a normal distribution.

66
New cards

What are the options, in rough order, when statistical assumptions are violated?

  1. Ignore: If sample sizes are large, the tests are often robust

  2. Transformations: (eg. log-transform)

  3. Permutation tests

  4. bootstrapping 

  5. non-parametric tests 

67
New cards

When can we often ignore violations of normality?

When sample sizes are large. The Central Limit Theorem (CLT) states that the means of large samples are normally distributed, even if the underlying data is not. 

68
New cards

What test is ideal if you have unequal variance between two groups?

Welch’s t-test

69
New cards

What is a data transformation?

It changes each data point by some simple mathematical formula, applying the same formula to every individual.

70
New cards

what is the log-transformation (eg. Y’ = ln[Y]) especially good at fixing?

Data that is skewed to the right.

71
New cards

What are three signs that a log-transformation might be useful?

  1. The frequency distribution is skewed to the right

  2. The variance seems to increase as the mean gets larger (when comparing groups) 

  3. The variable is the result of multiplying or dividing other components 

72
New cards

What is the correct way to work with log-transformed data?

First, transform each individual data point (eg. ln(Y)). Then, calculate the mean, SD, and CI using those new transformed values. 

73
New cards

What is the wrong way to find the mean of log-transformed data? 

To calculate the mean of the original data first, and then take the log of that mean (eg. ln(Y-bar)). The log of the mean does not equal the mean of the log values

74
New cards

What R function calculates the natural log (base e)? 

The log() function. It can be applied to a simgle number or a whole vector.

75
New cards

what is the “flaw of averages” 

The idea that designing for the “average” often means designing for no one. This mean is a useful prediction but the probability of observing the mean value for multiple variables all at the same time is extremely low. 

76
New cards

What does the phrase “Correlation does not imply causation” mean?

Just because two variables are associated (correlated) does not mean that one causes the other.

77
New cards

What is a spurious correlation?

A strong correlation that appears in a dataset purely by chance, like the correlation between Nicolas Cage films and swimming pool drownings. 

78
New cards

What is data dredging?

The practice of performing many, many statistical tests on a dataset until you fins a relationship that appears significant, even though it is just a random match 

79
New cards

What is a confounding variable?

An unmeasured, “sneaky third” variable that is the actual cause of both of the variables you are examining, making them look correlated when they are not directly related. 

80
New cards

What is the classic example of a confounding variable?

Shark attacks and ice cream sales are highly correlated. The confounding variable is Temperature (or season); hot weather causes both more people to swim (rising shark attacks) and more people to buy ice cream 

81
New cards

What is the difference between an observational and experimental study?

experimental: The researcher assigns treatments randomly to individuals.

observational: The researcher does not assign the treatments; they only observe. 

82
New cards

What is the main benefit of an experimental study?

random assignment averages out the effects of confounding variables, making it easier to determine a causal relationship. 

83
New cards

what are the two main goals of experimental design?

  1. reduce bias 

  2. reduce sampling error (which increases precision and power) 

84
New cards

what are the three main design features that reduce bias? 

  1. controls 

  2. random assignment 

  3. blinding 

85
New cards

what is a control group?

A group that is identical to the experimental group in all respects except for the treatment itself 

86
New cards

what is a placebo?

A sham treatment given to a control group. It helps account for the placebo effect, where patients improve simply because they believe they are receiving treatment 

87
New cards

Why is a control group important for “independent recovery”? 

People often seek treatments when they feel their worst, and may improve naturally over time. A control group shows what would have happened without the treatment, providing a baseline for comparison. 

88
New cards

What is blinding? 

The process of preventing the participant (single blind) or both the participant and the researcher (double-blind) from knowing which treatment is being administered. 

89
New cards

Why is blinding important?

Unblinded studies often find much larger (and likely unbiased) effects, suggesting that knowing who go t the treatment can influence the results. 

90
New cards

What are the three main design features that reduce sampling error?

  1. replication 

  2. balance 

  3. blocking 

91
New cards

what is replication?

carrying out the study on multiple independent units.

92
New cards

If you put 4 plants in a “control” chamber and 4 plants in a “treatment” chamber, what is your true sample size (n) for each group?

n = 1 for each group. The chamber is the independent unit, not the plants. The 4 plants in each chamber are pseudo-replicates.

93
New cards

What is balance in experimental design?

Having nearly equal sample size (n1 = n2) in each treatment group.

94
New cards

Why is a balanced design more powerful?

for a fixed total sample size, the standard error is smallest (and precision is highest) when the sample sizes in each group are equal.

95
New cards

What is blocking?

Grouping similar experimental units together before randomly applying treatments. For example, grouping patients by “hospital” or plots of land by “field”

96
New cards

How does blocking increase precision? 

It accounts for variation between the blocks. This removes that variation from the “noice” (error), making it easier to see the “signal” (the treatment effect).

97
New cards

what statistical test is used to compare the means of two groups?

two-sample t-test 

98
New cards

what is the null hypothesis (H0) for a two-sample t-test?

H0: mu1 = mu2 (the means od the two populations are equal)

99
New cards

What is the problem with conducting multiple t-test if you have more than two groups (eg. 4 groups)? 

The probability of making at least one Type I error (a false positive) becomes much greater than the significance level (alpha) you set for a single test

100
New cards

If you have 4 groups, how many uniquw pairwise comparisons are possible? 

The formula is (4-choose-2) = 6)