OPT 323 Biostatistics

0.0(0)
studied byStudied by 0 people
0.0(0)
call with kaiCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/79

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:00 AM on 1/19/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

80 Terms

1
New cards

What is a population?

collection of people that we want to generalize a set of findings to

2
New cards

What is a sample?

small part of population that we study to determine the generalities we are interested in = should be representative of the population

3
New cards

What is a discrete numeric variable?

numeric variable that can only take on certain discrete values with gaps or interruptions in the values that the variable can assume (usually integers)

ex) count data of patients

<p>numeric variable that can only take on certain discrete values with gaps or interruptions in the values that the variable can assume (usually integers)</p><p>ex) count data of patients</p>
4
New cards

What is a continuous numeric variable?

numeric variable that can technically be measured with unlimited precision with NO gaps in values that the variable could assume

ex) BP, IOP

<p>numeric variable that can technically be measured with unlimited precision with NO gaps in values that the variable could assume</p><p>ex) BP, IOP</p>
5
New cards

What is an ordered categorical variable?

categorical with a “value” variable that can take on a logical order, sequence or rank

ex) level of physical fitness

<p>categorical with a “value” variable that can take on a logical order, sequence or rank</p><p>ex) level of physical fitness</p>
6
New cards

What is an unordered categorical variable?

categorical variable with a “value” that is NOT able to be organized in a logical order, sequence or rank

ex) eye colour

<p>categorical variable with a “value” that is NOT able to be organized in a logical order, sequence or rank</p><p>ex) eye colour</p>
7
New cards

What is a dichotomous variable?

variable only consists of 2 categories

<p>variable only consists of 2 categories</p>
8
New cards

Ex) Defining a cataract as either nuclear, cortical, or posterior subcapsular is an example of what type of variable?

categorical, unordered

9
New cards

Ex) Defining a cataract as either 1+, 2+, 3+, or 4+ is an example of what type of variable?

numeric/categorical, discrete, ordered

10
New cards

What is an independent variable?

variable that is manipulated by the experimenter and that does not depend on any other variables = “predictor variable” = X axis

11
New cards

What is a dependent variable?

variable that is not manipulated by the experimenter and that does depend on the other variables = "outcome variable" = Y axis

12
New cards

What is the mean?

average of all values in a data set

<p>average of all values in a data set</p>
13
New cards

What is the median?

middle value in a data set

<p>middle value in a data set</p>
14
New cards

What is the mode?

most commonly occurring value in a data set

<p>most commonly occurring value in a data set</p>
15
New cards

What is variance?

reflects how different each data point if from the mean

<p>reflects how different each data point if from the mean</p>
16
New cards

What is standard deviation?

square root of the variance

<p>square root of the variance</p>
17
New cards

What % of data falls within 1 standard deviation of the mean?

68%

<p>68%</p>
18
New cards

What % of data falls within 2 standard deviations of the mean?

95%

<p>95%</p>
19
New cards

What % of data falls within 3 standard deviations of the mean?

99.7%

<p>99.7%</p>
20
New cards

What is a normal distribution? What 2 things do we need to define it?

symmetric bell curve distribution of data defined solely by the mean and standard deviation

<p>symmetric bell curve distribution of data defined solely by the mean and standard deviation</p>
21
New cards

When might we use a t distribution?

useful for data sets that are not fit by the normal distribution (small sample sizes esp < 30) = distribution of data changes based on the degrees of freedom

<p>useful for data sets that are not fit by the normal distribution (small sample sizes esp &lt; 30) = distribution of data changes based on the degrees of freedom</p>
22
New cards

The smaller the degrees of freedom, the __________ the peak and the ___________ the tails of a t distribution.

smaller degrees of freedom:

lower peak

higher tails

<p>smaller degrees of freedom:</p><p>lower peak</p><p>higher tails</p>
23
New cards

What is a null hypothesis?

H0 = there is no statistical difference between the 2 groups

<p>H0 = there is no statistical difference between the 2 groups</p>
24
New cards

What is an alternative hypothesis?

Ha = there is a statistical difference between the 2 groups

<p>Ha = there is a statistical difference between the 2 groups</p>
25
New cards

What is the p-value?

probability of observing a certain data set given that the null hypothesis is true

<p>probability of observing a certain data set given that the null hypothesis is true</p>
26
New cards

What does it mean if we have a large p-value that is larger than our pre-set alpha value?

we do not have evidence to reject the null hypothesis = supports the null

<p>we do not have evidence to reject the null hypothesis = supports the null</p>
27
New cards

What does it mean if we have a small p-value that is smaller than our pre-set alpha value?

we do have evidence to reject the null hypothesis = supports the alternative

<p>we do have evidence to reject the null hypothesis = supports the alternative</p>
28
New cards

Essentially, a smaller p-value indicates that there is ___________ support for our alternative hypothesis.

stronger

<p>stronger</p>
29
New cards

What is an independent t-test?

determines whether the 2 means collected from 2 independent sample groups are significantly different

ex) 1 group receives drug, 1 group receives placebo

30
New cards

What is a dependent t-test?

determines whether the 2 means collected from 1 dependent sample group are significantly different

ex) 2 measurements conducted on the same person at different times

31
New cards

How do we determine cumulative incidence from a contingency table?

= exposed people with disease / total exposed people with and without disease

can also do with unexposed

<p>= exposed people with disease / total exposed people with and without disease</p><p>can also do with unexposed </p>
32
New cards

How do we determine relative risk from a contingency table?

cohort study:

= cumulative incidence in exposed / cumulative incidence in unexposed

<p>cohort study:</p><p>= cumulative incidence in exposed / cumulative incidence in unexposed</p>
33
New cards

How do we determine odds from a contingency table?

= exposed people with disease / exposed people without disease

can also do with unexposed

<p>= exposed people with disease / exposed people without disease</p><p>can also do with unexposed</p>
34
New cards

How do we determine odds ratio from a contingency table?

case control study:

= odds in exposed / odds in unexposed

<p>case control study:</p><p>= odds in exposed / odds in unexposed</p>
35
New cards

Does the relative risk or odds ratio always overestimate the risk?

odds ratio = this is especially true when the outcome/disease is common

<p>odds ratio = this is especially true when the outcome/disease is common</p>
36
New cards

Ex) from this data of Yellow Fever in Memphis, what is the cumulative incidence in the exposed?

= exposed people with disease / total exposed people with and without disease

= 4204 / 6000

= 70%

<p>= exposed people with disease / total exposed people with and without disease</p><p>= 4204 / 6000</p><p>= 70%</p>
37
New cards

Ex) from this data of Yellow Fever in Memphis, what is the cumulative incidence in the unexposed?

= unexposed people with disease / total unexposed people with and without disease

= 946 / 14,000

= 6.8%

<p>= unexposed people with disease / total unexposed people with and without disease</p><p>= 946 / 14,000</p><p>= 6.8%</p>
38
New cards

Ex) from this data of Yellow Fever in Memphis, what is the relative risk for the outcome?

= cumulative incidence in exposed / cumulative incidence in unexposed

= 70% / 6.8%

= 10.3x

<p>= cumulative incidence in exposed / cumulative incidence in unexposed</p><p>= 70% / 6.8%</p><p>= 10.3x</p>
39
New cards

Ex) from this data of Yellow Fever in Memphis, what is the odds in the exposed?

= exposed people with disease / exposed people without disease

= 4204 / 1769

= 2.34

<p>= exposed people with disease / exposed people without disease</p><p>= 4204 / 1769</p><p>= 2.34</p>
40
New cards

Ex) from this data of Yellow Fever in Memphis, what is the odds in the unexposed?

= unexposed people with disease / unexposed people without disease

= 946 / 13,054

= 0.072

<p>= unexposed people with disease / unexposed people without disease</p><p>= 946 / 13,054</p><p>= 0.072</p>
41
New cards

Ex) from this data of Yellow Fever in Memphis, what is the odds ratio?

= odds in exposed / odds in unexposed

= (4204 / 1976) / (946 / 13,054)

= 32.30

<p>= odds in exposed / odds in unexposed</p><p>= (4204 / 1976) / (946 / 13,054) </p><p>= 32.30</p>
42
New cards

Ex) from this data of SCO Honors students, what is the cumulative incidence in the exposed?

41.9%

<p>41.9%</p>
43
New cards

Ex) from this data of SCO Honors students, what is the cumulative incidence in the unexposed?

29.9%

<p>29.9%</p>
44
New cards

Ex) from this data of SCO Honors students, what is the relative risk?

1.40

<p>1.40</p>
45
New cards

Ex) from this data of SCO Honors students, what is the odds in the exposed?

0.720

<p>0.720</p>
46
New cards

Ex) from this data of SCO Honors students, what is the odds in the unexposed?

0.426

<p>0.426</p>
47
New cards

Ex) from this data of SCO Honors students, what is the odds ratio?

1.69

<p>1.69</p>
48
New cards

What is the Chi-squared test for independence?

tests the association between 2 categorical variables using a p-value to assess H0 (no association) and Ha (association)

<p>tests the association between 2 categorical variables using a p-value to assess H0 (no association) and Ha (association)</p>
49
New cards

What is a type I error?

rejecting the null hypothesis when it is actually true = false positive = worst kind of error!

ex) convicting someone of a crime they did not commit

ex) approving an IOP drop as working "better" than timolol when it is not

<p>rejecting the null hypothesis when it is actually true = false positive = worst kind of error!</p><p>ex) convicting someone of a crime they did not commit</p><p>ex) approving an IOP drop as working "better" than timolol when it is not</p>
50
New cards

What is a type II error?

rejecting the alternate hypothesis when it is actually true = false negative

ex) a guilty person is set free

ex) not approving an IOP drop even though it actually is "better" than timolol

<p>rejecting the alternate hypothesis when it is actually true = false negative</p><p>ex) a guilty person is set free</p><p>ex) not approving an IOP drop even though it actually is "better" than timolol</p>
51
New cards

Ex) what is the number of false positives for this data of SCO Honors students?

25

<p>25</p>
52
New cards

Ex) what is the number of false negatives for this data of SCO Honors students?

23

<p>23</p>
53
New cards

What is sensitivity?

proportion of subjects with the disease who have a positive test result = how good the test is at detecting true positives out of all people with disease

<p>proportion of subjects with the disease who have a positive test result = how good the test is at detecting true positives out of all people with disease</p>
54
New cards

How do we calculate sensitivity?

= # true positives / all people with disease

<p>= # true positives / all people with disease</p>
55
New cards

What is specificity?

proportion of subjects without the disease who have a negative test result = how good the test is at detecting true negatives out of all people without disease

<p>proportion of subjects without the disease who have a negative test result = how good the test is at detecting true negatives out of all people without disease</p>
56
New cards

How do we calculate specificity?

= # true negatives / all people without disease

<p>= # true negatives / all people without disease</p>
57
New cards

What is positive predictive value?

proportion of subjects who test positive that actually have the condition = how good the test is at detecting people with disease out of all people who test positive

<p>proportion of subjects who test positive that actually have the condition = how good the test is at detecting people with disease out of all people who test positive</p>
58
New cards

How do we calculate positive predictive value?

= # true positives / all people who test positive

<p>= # true positives / all people who test positive</p>
59
New cards

Ex) from this data set, what is the sensitivity?

64.3%

<p>64.3%</p>
60
New cards

Ex) from this data set, what is the specificity?

83.3%

<p>83.3%</p>
61
New cards

Ex) from this data set, what is the positive predictive value?

90%

<p>90%</p>
62
New cards

What is a correlation coefficient (r)?

used to assess the strength of the correlation between 2 continuous variables

<p>used to assess the strength of the correlation between 2 continuous variables</p>
63
New cards

What does the + or - mean for correlation coefficient (r)?

+ means positive correlation

- means negative correlation

THINK: reflects the slope of the line

<p>+ means positive correlation</p><p>- means negative correlation</p><p>THINK: reflects the slope of the line</p>
64
New cards

What is the range of values for correlation coefficient (r)?

r = 1.0 perfectly correlated

r ≥ 0.8 strong correlation

r < 0.8 but ≥ 0.5 fairly strong correlation

r < 0.5 weak correlation

r = 0.0 no correlation

<p>r = 1.0 perfectly correlated</p><p>r ≥ 0.8 strong correlation</p><p>r &lt; 0.8 but ≥ 0.5 fairly strong correlation</p><p>r &lt; 0.5 weak correlation </p><p>r = 0.0 no correlation</p>
65
New cards

What is simple linear regression?

linear model where one outcome is predicted from one predictor variable with a best-fit line

<p>linear model where one outcome is predicted from one predictor variable with a best-fit line</p>
66
New cards

What is the formula for simple linear regression?

y = mx + b

where y is the dependent variable, m is the slope, x is the independent variable, and b is the y intercept

<p>y = mx + b</p><p>where y is the dependent variable, m is the slope, x is the independent variable, and b is the y intercept</p>
67
New cards

What is multiple regression?

linear model where one outcome is predicted from two or more predictor variables

<p>linear model where one outcome is predicted from two or more predictor variables</p>
68
New cards

What is the formula for multiple regression?

y = (each beta coefficient x each independent variable) + b

<p>y = (each beta coefficient x each independent variable) + b</p>
69
New cards

What is the constant in multiple regression?

y-intercept = value of the dependent variable in a regression equation when it's independent variable(s) equal 0

<p>y-intercept = value of the dependent variable in a regression equation when it's independent variable(s) equal 0</p>
70
New cards

What is the beta coefficient in multiple regression?

degree of change in the dependent variable for every 1-unit change in a certain independent variable

ex) if beta is 0.2, then for every one unit increase in x there is a 0.2 increase in y

<p>degree of change in the dependent variable for every 1-unit change in a certain independent variable</p><p>ex) if beta is 0.2, then for every one unit increase in x there is a 0.2 increase in y</p>
71
New cards

What is the coefficient p-value in multiple regression?

tells us whether or not an independent variable is statisticlaly significant

ex) if the p-value is less than the cutoff, the independent variable is stat significant

<p>tells us whether or not an independent variable is statisticlaly significant</p><p>ex) if the p-value is less than the cutoff, the independent variable is stat significant</p>
72
New cards

What is the standard error in multiple regression?

another way to tell us how well the linear regression lines fits the data = average distance that the observes values fall from the regression line

<p>another way to tell us how well the linear regression lines fits the data = average distance that the observes values fall from the regression line</p>
73
New cards

Does a smaller or larger standard error indicate that the model is better able to fit the data?

smaller SE

<p>smaller SE</p>
74
New cards

What is the R-squared value in multiple regression?

tells us how well the linear regression lines "fits" the data = the proportion of the variance in the dependent variables that can be explained by the independent variables

ex) and R2 values of 0.16 tells us that only 16% of the variance in the dependent variable can be explained by the independent variables

<p>tells us how well the linear regression lines "fits" the data = the proportion of the variance in the dependent variables that can be explained by the independent variables</p><p>ex) and R2 values of 0.16 tells us that only 16% of the variance in the dependent variable can be explained by the independent variables</p>
75
New cards

What is the possible range of values of the R-squared value in multiple regression?

0 to 1

0 indicates that the response variable is not explained by the predictor variable at all

1 indicates that the response variable is completely explained by the predictor variable w/o error

<p>0 to 1 </p><p>0 indicates that the response variable is not explained by the predictor variable at all</p><p>1 indicates that the response variable is completely explained by the predictor variable w/o error</p>
76
New cards

What is logistic regression?

a subset of multiple regression only used when the outcome is a categorical variable (e.g. outcome of seeing an optometrist or not, having disease or not) = since there is no linear relationship between x and y, we have to use the log of the y value (typically the log of the odds/probability)

<p>a subset of multiple regression only used when the outcome is a categorical variable (e.g. outcome of seeing an optometrist or not, having disease or not) = since there is no linear relationship between x and y, we have to use the log of the y value (typically the log of the odds/probability)</p>
77
New cards

How do we typically use logistic regression?

use formula to calculate probability that an observation takes on a value of 1 = use this formula to predict whether something will be 1 = use predetermined probability threshold to classify an observation as 1 or 0

<p>use formula to calculate probability that an observation takes on a value of 1 = use this formula to predict whether something will be 1 = use predetermined probability threshold to classify an observation as 1 or 0</p>
78
New cards

How does linear vs logistic regression differ in terms of the dependent variable?

linear = continuous

logistic = categorical

<p>linear = continuous</p><p>logistic = categorical</p>
79
New cards

How does linear vs logistic regression differ in terms of the outcome variable?

linear = continuous

logistic = probability

<p>linear = continuous</p><p>logistic = probability</p>
80
New cards

How does linear vs logistic regression differ in terms of the method used to find best fit equation?

linear = ordinary least squares

logistic = maximum likelihood estimation

<p>linear = ordinary least squares</p><p>logistic = maximum likelihood estimation</p>