Biostats Final

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/59

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

60 Terms

1
New cards

Why do we collect data?

To answer if variability is real or not

2
New cards

Systematic error

The error only goes towards one direction

  • Ex: If we are measuring weight with an uncalibrated scale, it will add or reduce the weight systematically

3
New cards

Random error

Due to chance, the error can go into both directions

  • Ex: If we take two samples of the class and measure your weight, the difference can vary for both directions

4
New cards

Quantitative data

Numeric

5
New cards

Qualitative data

Categorical

6
New cards

Types of quantitative data

Continuous

Discrete

7
New cards

Types of qualitative data

Binary

Nominal

Ordinal

8
New cards

Continuous variables

Variables that describe the values of a continuous scale

  • Ex: weight, BMI, height

9
New cards

Discrete variable

Variable that describea the values of finite events, usually based on whole numbers

  • Ex: number of siblings, age…

10
New cards

Binary variable

Variable that describe the values of any event that only has two categories

  • Ex: death (yes, no), physically active (active, non-active)

11
New cards

Nominal variable

Variable that describes the values of any event that has two or more categories without order

Ex: who are the people that live with you? (live alone, partner, friend, family…), NBA team (Lakers, Bulls….)

12
New cards

Ordinal variable

Variable that describe the values of any event that has two or more categories with an order

Ex: grade in the course (A,B,C…), BMI (normal, overweight, obese…)

13
New cards

Mean (central tendency)

sum of the values of one variable, divided by the number of values

14
New cards

Median

A central tendency estimate that is exactly in the middle of the sample, dividing the sample in half

  • Can be used to address outlier issues

    • Equation: Even numbers: Average of n/2 and (n+2)/2

    • Odd numbers: (n+1)/2

15
New cards

Variability measures

Min and Max values (amplitude)

Position measures

Variance

Standard deviation

Coefficient of variation

Standard error

16
New cards

Min and Max values (range)

The difference between the extreme values

17
New cards

Position measures

Measures that separate the observations in equal parts (or almost), like the median

  • Ex 1: Quintiles

    • 1 (20%), 2 (20%), 3 (20%), 4 (20%), 5 (20%)

  • Ex 2: Percentiles

    • P10 (10%), P5 (5%), P50 (50%)

18
New cards

Variance

An average of the difference (squared) of each observation in relation to the overall mean

  • Squared because the differences can have negative values

  • Result doesn’t have the same unit as the individual values or the mean

19
New cards

Standard deviation

Square root of the variance

  • How much, in average, each value is to the mean

  • Uses the same unit as the original variable

20
New cards

Standard Error

Shows how much the sample mean is likely to vary from the population mean due to random error or sampling

  • Smaller SE suggests a more accurate representation of the population mean, while a larger SE indicates more uncertainty

21
New cards

Variation coefficient

(Standard deviation/mean)*100

  • The ratio of the standard deviation to the mean, often expressed as a percentage

  • Allows for comparisons between data sets with different means or units

22
New cards

Interquartile range

Range based on the 25th percentile and 75th percentile

23
New cards

What can we use categorical variables to measure?

Frequency

  • Count, raw number of events

Probability

  • The portion of the number of events compared to the sample

    • % of individuals with cancer

Odds

  • Chance

  • Comparison of individuals

    • Odds of having cancer compared to not having cancer

24
New cards

Probability

number of favorable outcomes/ number of possible outcomes

25
New cards

multiplicative rule for probability

used for the probability of the occurrence of both of two events, A and B

  • Prob(A and B)= Prob(A) x Prob (B)

26
New cards

Additive rule for probability

used for the occurrence of at least one of event A or event B (either)

  • Prob(A and B)= Prob (A) + Prob(B)-Prob(A+B)

27
New cards

Odds

Considers the probability of a successful event compared to a probability of a failure/unsuccessful event

  • Can present values from 0-infinity (not percentage)

28
New cards

Prevalence

represents the burden of disease in a particular time

  • number of people with the disease at particular point in time/total population

29
New cards

Incidence

represents the burden of new cases from a disease

risk=cumulative incidence= number of new cases of disease in period/number initially disease-free

30
New cards

Normal distribution

Used for continuous variables

Symmetrical around the mean

Bell-shaped

Describes biological events well

Values in the middle of the distribution are more frequent

Is tall and narrow when the standard deviation is low

Short and wide for higher standard deviation

31
New cards

central limit theory

Principle that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution

32
New cards

parameters to know if the distribution is normal

skewness

kurtosis

statistical test

visual interpretation

33
New cards

skewness

a measure of lack of symmetry. values close to 0 indicate normal distribution, or symmetry

34
New cards

kurtosis

a measure that describes how heavily the tails of a distribution differ from the tails of a normal distribution

  • values close to 3 indicate normal distribution

35
New cards

What is the problem if the distribution is not normal?

Can’t use MEAN as the central tendency measure, because it will be biased, instead we must:

  • must use other measures like MEDIAN

  • categorize the variable

  • make a transformation

36
New cards

Correlation

relationship between two numeric variables

measures the degree in which the variables are related

coefficient values ( r ) range btw -1 to 1

  • -1: perfect negative correlation

  • 0: no correlation

  • 1: perfect positive correlation

37
New cards

R squared

Indicates the percentage of the variability of the outcome that is explained by the exposure

38
New cards

Pearson correlation test

used for continuous variables

at least one should have a normal distribution

39
New cards

Spearman correlation test

based on ranks

continuous or ordinal variables

used when we don’t have normal distributions

40
New cards

null hypothesis

hypothesis that there is no significant difference btw specified populations, and any observed difference is due to chance or error

  • Example when looking at mean physical activity btw English and Spanish speakers

    • mean physical activity is NOT different

41
New cards

alternative hypothesis

there is a significant difference btw the specified populations

  • Example when looking at mean physical activity btw English and Spanish speakers

    • mean physical activity is differnt

42
New cards

Type 1 Hypothesis Error

Rejects the null hypothesis, when the null hypothesis is TRUE

Says there’s a difference when in fact it hasn’t

P-value (5% or <0.05)

43
New cards

Type II Error

Don’t reject the null hypothesis, when the null hypothesis is FALSE

Don’t say there’s a difference when in fact there is a difference

44
New cards

Confidence interval

Represents the variability of our measure, based on a sampling distribution

  • Usually we use 95%

45
New cards

ANOVA

association of a numeric exposure with a categorical exposure with TWO OR MORE categories

based on independent samples

46
New cards

T-test

numeric outcome, binary exposure

comparison of means btw TWO INDEPENDENT groups

  • example: comparing if the mean physical activity is the same in males and females

47
New cards

Paired sample

Either same individuals with two measures over time or

pair of individuals, with each having one measure

48
New cards

Types of categorical variables

Dichotomic (two categories)

Politomic (three or more categories)

Ordinal (categories have a specific order)

49
New cards

Use 2×2 tables

Chi-squared

Fisher Exact

McNemar Test

50
New cards

Use 2xK Tables

Chi-squared

Linear trend

51
New cards

2×2 contingency table

knowt flashcard image
52
New cards

Longitudinal estimation: Incidence/Prevalence

ICexp=a/m1

ICnexp=b/m2

<p>ICexp=a/m1</p><p>ICnexp=b/m2</p>
53
New cards

Longitudinal estimation: Odds

ODDSexp= a/c

ODDSnexp= b/d

<p>ODDSexp= a/c</p><p>ODDSnexp= b/d</p>
54
New cards

Case-control estimations: Exposure prevalence

PRexp=a/n1

PRnexp=c/n2

<p>PRexp=a/n1</p><p>PRnexp=c/n2</p>
55
New cards

Case-control estimations: Odds

ODDSexp=a/b

ODDSnexp=c/d

<p>ODDSexp=a/b</p><p>ODDSnexp=c/d</p>
56
New cards

Chi-squared test

Compares the observed values in each of the categories of the table with the expected values

<p>Compares the observed values in each of the categories of the table with the expected values </p>
57
New cards

Degrees of freedom

An estimation of the number of independent categories in a particular statistical test

58
New cards

Fisher Exact Test

Used when the chi-squared approximation is not good

Used when expected values are too small

  • Total N<20, independent of the expected values

  • Total N btw 20 and 40, with expected values <5

Computationally “heavy”

Uses the exact probabilities of the hypergeometric distribution

59
New cards

Difference btw correlation and regression

With correlation we can only see how much two variables are related to each other

  • Correlation of 0.80 means that two variables are positively and strongly correlated

With the regression model, we can estimate how much one variable is affecting the other

  • A regression coefficient of 2.0 means that, on average, each unit increases in the exposure increases 2.0 units in the outcome

60
New cards

Residual

the error btw the observed values and the estimate values based on the regression