Biostats Final

0.0(0)

Studied by 1 person

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/59

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

60 Terms

New cards

Why do we collect data?

To answer if variability is real or not

New cards

Systematic error

The error only goes towards one direction

Ex: If we are measuring weight with an uncalibrated scale, it will add or reduce the weight systematically

New cards

Random error

Due to chance, the error can go into both directions

Ex: If we take two samples of the class and measure your weight, the difference can vary for both directions

New cards

Quantitative data

Numeric

New cards

Qualitative data

Categorical

New cards

Types of quantitative data

Continuous

Discrete

New cards

Types of qualitative data

Binary

Nominal

Ordinal

New cards

Continuous variables

Variables that describe the values of a continuous scale

Ex: weight, BMI, height

New cards

Discrete variable

Variable that describea the values of finite events, usually based on whole numbers

Ex: number of siblings, age…

New cards

Binary variable

Variable that describe the values of any event that only has two categories

Ex: death (yes, no), physically active (active, non-active)

New cards

Nominal variable

Variable that describes the values of any event that has two or more categories without order

Ex: who are the people that live with you? (live alone, partner, friend, family…), NBA team (Lakers, Bulls….)

New cards

Ordinal variable

Variable that describe the values of any event that has two or more categories with an order

Ex: grade in the course (A,B,C…), BMI (normal, overweight, obese…)

New cards

Mean (central tendency)

sum of the values of one variable, divided by the number of values

New cards

Median

A central tendency estimate that is exactly in the middle of the sample, dividing the sample in half

Can be used to address outlier issues
- Equation: Even numbers: Average of n/2 and (n+2)/2
- Odd numbers: (n+1)/2

New cards

Variability measures

Min and Max values (amplitude)

Position measures

Variance

Standard deviation

Coefficient of variation

Standard error

New cards

Min and Max values (range)

The difference between the extreme values

New cards

Position measures

Measures that separate the observations in equal parts (or almost), like the median

Ex 1: Quintiles
- 1 (20%), 2 (20%), 3 (20%), 4 (20%), 5 (20%)
Ex 2: Percentiles
- P10 (10%), P5 (5%), P50 (50%)

New cards

Variance

An average of the difference (squared) of each observation in relation to the overall mean

Squared because the differences can have negative values
Result doesn’t have the same unit as the individual values or the mean

New cards

Standard deviation

Square root of the variance

How much, in average, each value is to the mean
Uses the same unit as the original variable

New cards

Standard Error

Shows how much the sample mean is likely to vary from the population mean due to random error or sampling

Smaller SE suggests a more accurate representation of the population mean, while a larger SE indicates more uncertainty

New cards

Variation coefficient

(Standard deviation/mean)*100

The ratio of the standard deviation to the mean, often expressed as a percentage
Allows for comparisons between data sets with different means or units

New cards

Interquartile range

Range based on the 25th percentile and 75th percentile

New cards

What can we use categorical variables to measure?

Frequency

Count, raw number of events

Probability

The portion of the number of events compared to the sample
- % of individuals with cancer

Odds

Chance
Comparison of individuals
- Odds of having cancer compared to not having cancer

New cards

Probability

number of favorable outcomes/ number of possible outcomes

New cards

multiplicative rule for probability

used for the probability of the occurrence of both of two events, A and B

Prob(A and B)= Prob(A) x Prob (B)

New cards

Additive rule for probability

used for the occurrence of at least one of event A or event B (either)

Prob(A and B)= Prob (A) + Prob(B)-Prob(A+B)

New cards

Odds

Considers the probability of a successful event compared to a probability of a failure/unsuccessful event

Can present values from 0-infinity (not percentage)

New cards

Prevalence

represents the burden of disease in a particular time

number of people with the disease at particular point in time/total population

New cards

Incidence

represents the burden of new cases from a disease

risk=cumulative incidence= number of new cases of disease in period/number initially disease-free

New cards

Normal distribution

Used for continuous variables

Symmetrical around the mean

Bell-shaped

Describes biological events well

Values in the middle of the distribution are more frequent

Is tall and narrow when the standard deviation is low

Short and wide for higher standard deviation

New cards

central limit theory

Principle that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution

New cards

parameters to know if the distribution is normal

skewness

kurtosis

statistical test

visual interpretation

New cards

skewness

a measure of lack of symmetry. values close to 0 indicate normal distribution, or symmetry

New cards

kurtosis

a measure that describes how heavily the tails of a distribution differ from the tails of a normal distribution

values close to 3 indicate normal distribution

New cards

What is the problem if the distribution is not normal?

Can’t use MEAN as the central tendency measure, because it will be biased, instead we must:

must use other measures like MEDIAN
categorize the variable
make a transformation

New cards

Correlation

relationship between two numeric variables

measures the degree in which the variables are related

coefficient values ( r ) range btw -1 to 1

-1: perfect negative correlation
0: no correlation
1: perfect positive correlation

New cards

R squared

Indicates the percentage of the variability of the outcome that is explained by the exposure

New cards

Pearson correlation test

used for continuous variables

at least one should have a normal distribution

New cards

Spearman correlation test

based on ranks

continuous or ordinal variables

used when we don’t have normal distributions

New cards

null hypothesis

hypothesis that there is no significant difference btw specified populations, and any observed difference is due to chance or error

Example when looking at mean physical activity btw English and Spanish speakers
- mean physical activity is NOT different

New cards

alternative hypothesis

there is a significant difference btw the specified populations

Example when looking at mean physical activity btw English and Spanish speakers
- mean physical activity is differnt

New cards

Type 1 Hypothesis Error

Rejects the null hypothesis, when the null hypothesis is TRUE

Says there’s a difference when in fact it hasn’t

P-value (5% or <0.05)

New cards

Type II Error

Don’t reject the null hypothesis, when the null hypothesis is FALSE

Don’t say there’s a difference when in fact there is a difference

New cards

Confidence interval

Represents the variability of our measure, based on a sampling distribution

Usually we use 95%

New cards

ANOVA

association of a numeric exposure with a categorical exposure with TWO OR MORE categories

based on independent samples

New cards

T-test

numeric outcome, binary exposure

comparison of means btw TWO INDEPENDENT groups

example: comparing if the mean physical activity is the same in males and females

New cards

Paired sample

Either same individuals with two measures over time or

pair of individuals, with each having one measure

New cards

Types of categorical variables

Dichotomic (two categories)

Politomic (three or more categories)

Ordinal (categories have a specific order)

New cards

Use 2×2 tables

Chi-squared

Fisher Exact

McNemar Test

New cards

Use 2xK Tables

Chi-squared

Linear trend

New cards

2×2 contingency table

New cards

Longitudinal estimation: Incidence/Prevalence

ICexp=a/m1

ICnexp=b/m2

New cards

Longitudinal estimation: Odds

ODDSexp= a/c

ODDSnexp= b/d

New cards

Case-control estimations: Exposure prevalence

PRexp=a/n1

PRnexp=c/n2

New cards

Case-control estimations: Odds

ODDSexp=a/b

ODDSnexp=c/d

New cards

Chi-squared test

Compares the observed values in each of the categories of the table with the expected values

New cards

Degrees of freedom

An estimation of the number of independent categories in a particular statistical test

New cards

Fisher Exact Test

Used when the chi-squared approximation is not good

Used when expected values are too small

Total N<20, independent of the expected values
Total N btw 20 and 40, with expected values <5

Computationally “heavy”

Uses the exact probabilities of the hypergeometric distribution

New cards

Difference btw correlation and regression

With correlation we can only see how much two variables are related to each other

Correlation of 0.80 means that two variables are positively and strongly correlated

With the regression model, we can estimate how much one variable is affecting the other

A regression coefficient of 2.0 means that, on average, each unit increases in the exposure increases 2.0 units in the outcome

New cards

Residual

the error btw the observed values and the estimate values based on the regression