Biological Data Analysis

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/186

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:05 PM on 4/2/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

187 Terms

1
New cards

survivorship bias

only having data on individuals that lived through an incident

2
New cards

random sampling qualifications

  1. each individual has an equal chance of being selected

  2. selection is independent between individuals

3
New cards

precision

minimizing sampling error

4
New cards

accuracy

minimizing bias

5
New cards

categorical variables

qualitative groupings with no inherent magnitude on a numerical scale

6
New cards

nominal variables

categroical variables with no obvious order

7
New cards

ordinal variables

categorical varibles that have an intrinsic pattern

8
New cards

numeric (quantitative) variabled

have a magnitude on a numerical scale

9
New cards

continuous variables

numeric variables for data containing any real number within some bounds

10
New cards

discrete variables

numeric variables for data containing only whole numbers

11
New cards

ratio vs interval scale

ratio scales have a true zero point representing the absence of the variable, while interval scales do not.

12
New cards

frequency distribution

the number if times each value occurs in a sample

histograms

13
New cards

Descriptive Statistics

quantities that capture features of the sample data

14
New cards

arithmetic mean

regular old averaging

very impacted by outliers

15
New cards

median

middlemost measurement

somewhat resistent to outliers

16
New cards

geometric mean

used to summarize data when variables are multiplicative

17
New cards

standard deviation

square root of variance

s = sqrt(s2)

18
New cards

variance

spread of data

s2=(sum(observation - mean)2)/(n-1)

19
New cards

coefficient of variation

used for ratio scale variables (spread is expected to increase with mean)

CV=s/sample mean

20
New cards

IQR

range with middle 50% of the data

useful for skewed data

21
New cards

sampling distribution

distribution of values for an estimate that we might obtain with repeated sampling of a population

22
New cards

standard error

standard deviation of sampling distribution

SE(mean) = s/sqrt(n)

23
New cards

Confidence Interval

range of values that are likely to contain the target parameter

sample mean ± 1.96 * SE(mean) for normally distributed data

24
New cards

Probability

proportion of times event occurs if repeating random trial many times

value between 0 and 1, where probabilities of all possibilities must sum to 1

25
New cards

Probability Mass Function

probability of all possibilities for discrete data

26
New cards

Probability Density Function

probability of all possibilities for continuous data

27
New cards

Marginal Probability

the probability of a single event occurring, independent of other variables

ex. P(A)

28
New cards

Conditional Probability

measures the likelihood of an event occurring given that another event has already occurred

ex. P(A|B)

29
New cards

Union Probability

measures the likelihood that at least one of multiple events occurs

ex. P(A u B)

30
New cards

Compliment Probability

calculates the likelihood of an event not occurring

ex. P(Ac)

31
New cards

Intersection/ Joint Probability

the likelihood of two or more independent or dependent events occurring simultaneously

ex. P(A,B)

32
New cards

P(A and B)

P(A)*P(B)

33
New cards

P(A or B)

P(A) + P(B) - P(A,B)

34
New cards

Bayes Theorum

Bayes' Theorem helps us update probabilities based on prior knowledge and new evidence

š‘ƒ(š“|šµ)=(š‘ƒ(šµ|š“)Ć—š‘ƒ(š“))/š‘ƒ(šµ)

35
New cards

P(A given B)

P(A,B)/P(B)

36
New cards

probability distribution

the mathematical function that gives the probabilities of occurance of possible outcomes

Used to: Fit models to datam represent uncertainty in parameters, and portray prior information

37
New cards

Normal Distribution

continuous

symmetrical around its mean

unimodal

probability density is highest at the mean

38
New cards

normal distribution ranges

x (-inf, +inf)

u (-inf, +inf)

sigma > 0

39
New cards

Z-score

test statistic for a normal distribution

(X-u)/sigma

40
New cards

log normal distribution

continuous

positive only

positive right skew

41
New cards

Log normal distribution ranges

x (0, +inf)

u (-inf, +inf)

sigma > 0

42
New cards

central limit theorum

the sum or mean of measurements randomly sampled from ANY distribution is approximately normally distribution

43
New cards

location moment

mean

44
New cards

spread moment

variance

45
New cards

symmetry moment

skewness

46
New cards

heavy tailed-ness moment

kurtosis

47
New cards

t-distribution ranges

x (-inf, +inf)

u (-inf, +inf)

sigma > 0

v (nu) > 0

48
New cards

poisson distribution

discrete

positive only

only one parameter (lambda)

normal is a good approximator when lambda is large

often used for counts

49
New cards

poisson distribution ranges

x (positive whole numbers)

lambda > 0

50
New cards

lambda

= mean = variance (s2)

51
New cards

underdispersed population

variance < mean

52
New cards

overdispersed

variance > mean

53
New cards

binomial distribution

discrete

positive only

used for success/fail trials

54
New cards

binomial distribution ranges

x (# of successes) (whole numbers > 0)

n (# of trials) (whole numbers > 0)

p (probability of a success) [0,1]

55
New cards

bernoulli distribution

a special case of binomial distribution where n (number of trials) =1

56
New cards

multinomial distribution

generalization of the binomial, when there are >2 categories

ex. dice

57
New cards

gamma distribution

continuous

positive only

flexible

multiple common parameterizations

58
New cards

gamma distribution ranges

x [0, +inf)

alpha (shape parameter) > 0

theta (scale parameter) > 0

59
New cards

Beta distribution

continuous

bound between 0 and 1

used for proportions

60
New cards

Beta distribution ranges

x (0,1)

alpha >0

Beta > 0

61
New cards

Directed Acyclic Graphs (DAGs)

can use these to denote causal relationships between variables

62
New cards

DAG features

node

edge

direction

a/cyclic

63
New cards

confounding variables

influences both the dependent and independent variables, causing a spurious correlation

64
New cards

Fork example

sun intensity, sunburns, ice cream sales

65
New cards

pipe example

day of the year, temperature, how fast ice cream melts

66
New cards

the collider

ice cream sales, quality of ice cream, outside temperature

67
New cards

hypothesis testing

compares collected data to expectations under a null hypothesis to determine how unlikely the data are

68
New cards

p-value

probability of obtaining a value that is as or more extreme than the observed value, given the null hypothesis is true

69
New cards

test statistic

value calculated from the data that is used to eveluate how unlikely your data are, given the null hypothesis is true.

70
New cards

p-value < a

reject the null hypothesis

71
New cards

š›¼

probability of committing a type one error (false positive). Generally 0.05

72
New cards

š›½

probability of committinga type two error (false negative)

73
New cards

p-value > š›¼

The data are compatible or consistent with the null hypothesis

74
New cards

Confidence intervals

range of values that are likely to contain the target parameter

75
New cards

Permutation tests

generates a null distribution of the test statistic by repeatedly rearranging values

slightly less power than parametric tests (like t-tests) when sample sizes are small

76
New cards

performing a permutation test

calculate test statistic

randomly rearrange data into new groups

caclulate test statistic for permuted data

repeat 1000+ times to generate sampling distribution of test statistic under the null hypothesis

calculate p-value

77
New cards

Bootstrap

resampling data with replacement to approximate the sampling distribution of an estimate

useful for finding standard error or confidence interval for a parameter estimate

78
New cards

performing bootstrapping

sample with replacement from original sample

calculate estimated median from boootstrapped data

repeat 10000+ times

calculate SE

79
New cards

ways to reduce bias

control group - does not receive treatment but is exposed to the same conditions

randomization - random assignment of treatments

blinding

80
New cards

single blinding

hides treatment details from participants to prevent behavioral bias

81
New cards

double blinding

hides treatment details from participants and researchers to prevent expectancy effects

82
New cards

ways to reduce sampling error

replication - application of each treatment to multiple, independent units OR application of multiple, identical ttreatments to a single unit

balance - equal sample size in all groups

blocking - grouping of units that share properties (like location); within each block, treatments are randomly assigned

83
New cards

statistical power

probability that a random sample will lead to rejection of a false null hypothesis

1 - beta

84
New cards

Type 1 error

rejecting the null hypothesis when the null hypothesis is true

False +

alpha

85
New cards

Type 2 error

failing to reject the null hypothesis when the null hypothesis is false

False -

Beta

86
New cards

Statistical Power is most affected by

Sample Size and Variance of Data

87
New cards

chi squared goodness of fit test

compares frequency data to a probability model stated by the null hypothesis

common for hypothesis testing

88
New cards

chi squared test statistic (x2)

sum((observation - EV)2/EV)

89
New cards

chi squared distrubution

only affected by k (degrees of freedom)

k = df

90
New cards

k

# categories - 1 - # parameters estimated from the data

91
New cards

chi squared assumptions

random and independent sampling

expected frequencies >1

no more than 20% of categories should have expected frequencies < 5

92
New cards

one sample t-test

compares the mean of a sample with some ā€œnull meanā€

93
New cards

t-distribution

continuous and symmetrical. Nu controls ā€œheavy tailed-nessā€

94
New cards

t-distribution test statistic

(sample mean - null mean) / SE of the sample mean

95
New cards

T-test df

n-1 = v

96
New cards

Paired t-test

compares the mean difference between two sample means to a null mean

controls for variation among plots that is otherwise difficult to control for

97
New cards

paired t-test test statistic

(mean difference - null mean)/SE of the mean difference

98
New cards

Unpaired t-test

compares the difference of one samle mean from another sample mean

99
New cards

Unpaired t-test test statistic

(sample 1 mean - sample 2 mean)/SE of sample 1 mean - sample 2 mean

100
New cards

Unpaired t-test df

v = n1 + n2 - 2

Explore top notes

note
Simple Molecular Substances
Updated 1223d ago
0.0(0)
note
Civil Rights Movement
Updated 325d ago
0.0(0)
note
Mixtures and Chromatography
Updated 1253d ago
0.0(0)
note
Untitled
Updated 583d ago
0.0(0)
note
servus + rex ending
Updated 147d ago
0.0(0)
note
Simple Molecular Substances
Updated 1223d ago
0.0(0)
note
Civil Rights Movement
Updated 325d ago
0.0(0)
note
Mixtures and Chromatography
Updated 1253d ago
0.0(0)
note
Untitled
Updated 583d ago
0.0(0)
note
servus + rex ending
Updated 147d ago
0.0(0)