Biostatistics - Exam 3

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/81

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

82 Terms

1
New cards

Define correlation

Describes linaer association between two numerical variables

2
New cards

What are the two components of describing correlation?

1. Sign / direction of relationship

2. magnitude/strength - tightness of data

3
New cards

What are the three possibilities of explaining a correlation between two variables?

1. Result occurred by change

2. One variable influences another

3. Variables are influenced by some other variable

4
New cards

What are the two ways in which variables can be influenced by a third variable?

1. A third variable (C) could influence both variables being examined (A and B).

2. One examined variable (A) could influence 3rd variable (C) which can influence other examined variable (B)

5
New cards

What is the correlation coefficient?

Describes strength and direction of linear association

6
New cards

What are some caveats of correlation?

1. Correlation does NOT imply causation

2. Be careful of 3rd variable

3. Correlation coefficient is a measure of linear association, does not address non-linear relationships

4. Confidence intervals and hypothesis tests are possible, but remember biological significance vs statistical

7
New cards

What kind of ratio is correlation coefficient?

Signal / Noise

8
New cards

What are the assumptions for correlation test?

1. Relationship between x and y is linear

2. Frequency distributions of x and y are separately normal

3. Variance of x doesnt change with y (and vice-versa). Gives circular or elliptical could of points, not

9
New cards

What do you do if correlation assumptions are untrue?

attempt to transform data by taking log (or natural log) of the values.

10
New cards

What does the Spearman's Rank correlation do?

Measures the strength and direction of linear association between ranks of two variables

11
New cards

How is the spearman's rank correlation performed?

Rank both variables from 1 to n, using mid ranks (averages) when you have multiple values at same rank. Then calculate correlation coefficient (r), actually rs in spearman test. compare rs to critical value in stats table.

12
New cards

When would you want to use a spearman's test?

When assumptions fail

13
New cards

What is linear regression?

Method that predicts the value of one numerical variable from that of another.

14
New cards

How is linear regression different from correlation?

The two variables are not treated equally.

15
New cards

What does regression measure?

How steeply y changes with changes in x

16
New cards

How does linear regression try to find the best fit?

Tries to find smallest sum of all squared deviations in y

17
New cards

What can the regression line equation be used for?

To predict values of y for a known x

18
New cards

What are the assumptions of linear regression?

1. Linear relationship between x and y

2. Frequency distribution of y values for each x value is normal

3. Variance of y doesnt change with x

4. Each measured y at a given x is a random sample from a population of y-measurements.

19
New cards

If variance of y doesnt change with x, what will cloud of points look like?

gives circular cloud of points and not a funnel shaped one

20
New cards

How are residuals calculated for linear regression?

Observed y - predicted y

21
New cards

What does the Standard Error of slope measure in linear regression?

Measure of uncertainty of sample estimate of slope

22
New cards

What does the confidence interval of slope measure in linear regression?

Measure of uncertainty of sample estimate of slope

23
New cards

For what part of linear regression can t-tests be done?

slope

24
New cards

What does the use of an ANOVA for linear regression compare?

Variance in residuals with variance for predicted values of y vs mean value of y

25
New cards

What should a plot of residuals look like?

Roughly symmetric with equal variance above and below the y=0 line, with little to no curvature from right to left along x-axis

26
New cards

What can residual plots be used for?

They help in assessing assumptions

27
New cards

What kind of curve does normal distribution have?

A bell shaped curve

28
New cards

What are mean, median, and mode in normal distribution?

They are all the same

29
New cards

What can normal distribution be fully described by?

Its mean and standard deviation

30
New cards

What do all non-parametric tests do do data?

They rank the data

31
New cards

Can you extrapolate based on linear regression?

No, extrapolation is not valid

32
New cards

Why is extrapolation not valid?

You don't know what data might do beyond observed values.

33
New cards

Narrower prediction interval means __ precision?

higher

34
New cards

Wider prediction interval means __ precision?

lower

35
New cards

What are the two ways in which predictions can happen for linear regression?

1. Predict mean y for given x

2. Predict single y for given x

36
New cards

What plot can be done to detect non-normality and unequal variance?

Residual plot

37
New cards

How is a residual plot done?

(y-y^) vs x

y^ - predicted value

38
New cards

What three features should a residual plot have?

1. Roughly symmetric, equal variance above and below y = 0 line.

2. Little to no curvature from left to right along x-axis

3. Approximately equal variance of points above and below line at all values of x.

39
New cards

What is the R^2 value?

Faction of variation in y that is "explained" by x.

40
New cards

What is the equation for R^2?

SS regression / SS total

SS sum of squares

41
New cards

How is SS total calculated?

Formula: SSTotal = Σ(yi - ȳ)²

yi: The observed values of the dependent variable.

ȳ: The mean of the observed values of the dependent variable.

42
New cards

How is SS regression calculated?

SSregression = Σ(ŷi - ȳ)²

ŷi: The predicted values of the dependent variable.

ȳ: The mean of the observed values of the dependent variable.

43
New cards

Why is normal distribution used as an assumption for many statistical test?

It is very common in nature

44
New cards

The normal distribution is ________ around its mean.

symmetric

45
New cards

How are 2/3 of random draws from the normal distribution related to the mean?

2/3 of random draws are within one standard deviation of the mean.

46
New cards

How are 95% of random draws from the normal distribution related to the mean?

95% of random draws are within two standard deviations of the mean

47
New cards

What do the characteristics of the normal distribution allow for?

Ready assessment of probability and statistical analysis.

48
New cards

Many statistical test assumptions are that data or error associated with data is ___________?

Normally distributed

49
New cards

What are the three possibilities when assumptions are untrue?

1. Ignore assumptions

2. Transform data

3. Non-parametric tests

50
New cards

What are non-parametric tests?

Tests that do not require one to assume a certain distribution for raw data.

51
New cards

What are the three common assumptions for statistical analysis?

1. Random samples

2. Normality

3. Equal variance

52
New cards

Which assumptions are tested?

Normality and equal variance

53
New cards

What plot(s) can examine normality?

Histograms

Q-Q plots

54
New cards

What does normal data look like in a histogram?

normal data isnt scewed

55
New cards

What does normal data look like in a Q-Q plot?

Data should follow a relatively straight line, but can wiggle a little.

56
New cards

What plot(s) can be used to assess equal variance?

Boxplots

57
New cards

What does data with equal variance look like on a boxplot?

Interquartile Ranges (boxes) are similar size

Whisker lengths are similar

58
New cards

What hypothesis test can be used to assess normality?

Shapiro-Wilk test

59
New cards

What is the R command for a Shapiro-Wilk test?

shapiro.test()

60
New cards

What is the null hypothesis for a Shapiro-Wilk test?

The data set is normally distributed.

61
New cards

For a p-value less than the significance level (0.05) in a Shapiro-Wilk test what yould you conclude?

Reject null hypothesis (that data is normally distributed) and conclude that the data set is not normally distributed

62
New cards

Most non-parametric data tests use____

ranks

63
New cards

How is data ranked in most non-parametric tests?

From lowest to highest. Lowest gets rank 1, next lowest gets rank 2....etc

64
New cards

What does a Mann-Whitney test do?

Compares central tendencies of two groups using ranks

65
New cards

What is a Mann-Whitney test called in R?

A wilcoxon test

66
New cards

How is a Mann-Whitney test performed?

Data from both groups are ranked together in order and the ranks for all individuals in each groups are summed.

67
New cards

What does U stand for?

The test statistic

68
New cards

What is U?

Number of times an individual from population one has a lower rank than an individual from population two out of all pairwise comparisions.

69
New cards

For a two tailed t-test how many Us are calculated?

two

70
New cards

If more than one U is calculated, what U is picked?

The largest

71
New cards

What is the equation for U?

U = n1n2 * (n1(n1+1)) / (2)) - R1

R1 = sum of all ranks for first group

n1 = sample size of group 1

n2 = sample size of group 2

72
New cards

If U is less than or equal to the critical value what happens?

the null hypothesis can be rejected

73
New cards

Ranks and test statistics are all done in R using the ___?

wilcoxon test

74
New cards

What is the R command for a wilcoxon test?

wilcox.test()

75
New cards

What are the assumptions of the Mann-Whitney U test (wilcoxon test)?

Both samples are random samples

Both populations have the same shape of distribution

76
New cards

What are the assumptions of a correlation test?

1. Linear

2. X and Y normally distributed

3. Constant variance

4. Random samples

77
New cards

What are the options when assumptions for correlation test fail?

Transformation of data, Spearman's rank correlation

78
New cards

When do you use a spearman's rank test?

When data violates normality assumption

When data has outliers

79
New cards

What is an R^2 value and how does it relate to ANOVA?

An R value describes the fraction of variance in a response variable that is accounted for by an explanatory variable. An ANOVA gives the statistical significance

80
New cards

Describe the general approach to problems

1. Determine objective

2. What does data look like? Plot and evaluate is assumptions are true for test you want to do.

3. Are varaibles normally distributed? Does variance change or is it equal?

81
New cards

What is the general approach to regression problems?

Same as approach to problems w/ additional

1. If assumptions aren't true, try to transform data

2. If data can't be transformed, use a nonparametric test (spearman's or wilcoxon)

82
New cards

How do you know whether to use spearman's or wilcoxon test?

If you want to know if two variables are related, use Spearman's.

If you want to know if two groups are different, use Wilcoxon.