Biological Data Analysis

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/186

There's no tags or description

Looks like no tags are added yet.

Last updated 3:05 PM on 4/2/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

187 Terms

New cards

survivorship bias

only having data on individuals that lived through an incident

New cards

random sampling qualifications

each individual has an equal chance of being selected
selection is independent between individuals

New cards

precision

minimizing sampling error

New cards

accuracy

minimizing bias

New cards

categorical variables

qualitative groupings with no inherent magnitude on a numerical scale

New cards

nominal variables

categroical variables with no obvious order

New cards

ordinal variables

categorical varibles that have an intrinsic pattern

New cards

numeric (quantitative) variabled

have a magnitude on a numerical scale

New cards

continuous variables

numeric variables for data containing any real number within some bounds

New cards

discrete variables

numeric variables for data containing only whole numbers

New cards

ratio vs interval scale

ratio scales have a true zero point representing the absence of the variable, while interval scales do not.

New cards

frequency distribution

the number if times each value occurs in a sample

histograms

New cards

Descriptive Statistics

quantities that capture features of the sample data

New cards

arithmetic mean

regular old averaging

very impacted by outliers

New cards

median

middlemost measurement

somewhat resistent to outliers

New cards

geometric mean

used to summarize data when variables are multiplicative

New cards

standard deviation

square root of variance

s = sqrt(s²)

New cards

variance

spread of data

s²=(sum(observation - mean)²)/(n-1)

New cards

coefficient of variation

used for ratio scale variables (spread is expected to increase with mean)

CV=s/sample mean

New cards

IQR

range with middle 50% of the data

useful for skewed data

New cards

sampling distribution

distribution of values for an estimate that we might obtain with repeated sampling of a population

New cards

standard error

standard deviation of sampling distribution

SE(mean) = s/sqrt(n)

New cards

Confidence Interval

range of values that are likely to contain the target parameter

sample mean ± 1.96 * SE(mean) for normally distributed data

New cards

Probability

proportion of times event occurs if repeating random trial many times

value between 0 and 1, where probabilities of all possibilities must sum to 1

New cards

Probability Mass Function

probability of all possibilities for discrete data

New cards

Probability Density Function

probability of all possibilities for continuous data

New cards

Marginal Probability

the probability of a single event occurring, independent of other variables

ex. P(A)

New cards

Conditional Probability

measures the likelihood of an event occurring given that another event has already occurred

ex. P(A|B)

New cards

Union Probability

measures the likelihood that at least one of multiple events occurs

ex. P(A u B)

New cards

Compliment Probability

calculates the likelihood of an event not occurring

ex. P(A^c)

New cards

Intersection/ Joint Probability

the likelihood of two or more independent or dependent events occurring simultaneously

ex. P(A,B)

New cards

P(A and B)

P(A)*P(B)

New cards

P(A or B)

P(A) + P(B) - P(A,B)

New cards

Bayes Theorum

Bayes' Theorem helps us update probabilities based on prior knowledge and new evidence

𝑃(𝐴|𝐵)=(𝑃(𝐵|𝐴)×𝑃(𝐴))/𝑃(𝐵)

New cards

P(A given B)

P(A,B)/P(B)

New cards

probability distribution

the mathematical function that gives the probabilities of occurance of possible outcomes

Used to: Fit models to datam represent uncertainty in parameters, and portray prior information

New cards

Normal Distribution

continuous

symmetrical around its mean

unimodal

probability density is highest at the mean

New cards

normal distribution ranges

x (-inf, +inf)

u (-inf, +inf)

sigma > 0

New cards

Z-score

test statistic for a normal distribution

(X-u)/sigma

New cards

log normal distribution

continuous

positive only

positive right skew

New cards

Log normal distribution ranges

x (0, +inf)

u (-inf, +inf)

sigma > 0

New cards

central limit theorum

the sum or mean of measurements randomly sampled from ANY distribution is approximately normally distribution

New cards

location moment

mean

New cards

spread moment

variance

New cards

symmetry moment

skewness

New cards

heavy tailed-ness moment

kurtosis

New cards

t-distribution ranges

x (-inf, +inf)

u (-inf, +inf)

sigma > 0

v (nu) > 0

New cards

poisson distribution

discrete

positive only

only one parameter (lambda)

normal is a good approximator when lambda is large

often used for counts

New cards

poisson distribution ranges

x (positive whole numbers)

lambda > 0

New cards

lambda

= mean = variance (s²)

New cards

underdispersed population

variance < mean

New cards

overdispersed

variance > mean

New cards

binomial distribution

discrete

positive only

used for success/fail trials

New cards

binomial distribution ranges

x (# of successes) (whole numbers > 0)

n (# of trials) (whole numbers > 0)

p (probability of a success) [0,1]

New cards

bernoulli distribution

a special case of binomial distribution where n (number of trials) =1

New cards

multinomial distribution

generalization of the binomial, when there are >2 categories

ex. dice

New cards

gamma distribution

continuous

positive only

flexible

multiple common parameterizations

New cards

gamma distribution ranges

x [0, +inf)

alpha (shape parameter) > 0

theta (scale parameter) > 0

New cards

Beta distribution

continuous

bound between 0 and 1

used for proportions

New cards

Beta distribution ranges

x (0,1)

alpha >0

Beta > 0

New cards

Directed Acyclic Graphs (DAGs)

can use these to denote causal relationships between variables

New cards

DAG features

node

edge

direction

a/cyclic

New cards

confounding variables

influences both the dependent and independent variables, causing a spurious correlation

New cards

Fork example

sun intensity, sunburns, ice cream sales

New cards

pipe example

day of the year, temperature, how fast ice cream melts

New cards

the collider

ice cream sales, quality of ice cream, outside temperature

New cards

hypothesis testing

compares collected data to expectations under a null hypothesis to determine how unlikely the data are

New cards

p-value

probability of obtaining a value that is as or more extreme than the observed value, given the null hypothesis is true

New cards

test statistic

value calculated from the data that is used to eveluate how unlikely your data are, given the null hypothesis is true.

New cards

p-value < a

reject the null hypothesis

New cards

𝛼

probability of committing a type one error (false positive). Generally 0.05

New cards

𝛽

probability of committinga type two error (false negative)

New cards

p-value > 𝛼

The data are compatible or consistent with the null hypothesis

New cards

Confidence intervals

range of values that are likely to contain the target parameter

New cards

Permutation tests

generates a null distribution of the test statistic by repeatedly rearranging values

slightly less power than parametric tests (like t-tests) when sample sizes are small

New cards

performing a permutation test

calculate test statistic

randomly rearrange data into new groups

caclulate test statistic for permuted data

repeat 1000+ times to generate sampling distribution of test statistic under the null hypothesis

calculate p-value

New cards

Bootstrap

resampling data with replacement to approximate the sampling distribution of an estimate

useful for finding standard error or confidence interval for a parameter estimate

New cards

performing bootstrapping

sample with replacement from original sample

calculate estimated median from boootstrapped data

repeat 10000+ times

calculate SE

New cards

ways to reduce bias

control group - does not receive treatment but is exposed to the same conditions

randomization - random assignment of treatments

blinding

New cards

single blinding

hides treatment details from participants to prevent behavioral bias

New cards

double blinding

hides treatment details from participants and researchers to prevent expectancy effects

New cards

ways to reduce sampling error

replication - application of each treatment to multiple, independent units OR application of multiple, identical ttreatments to a single unit

balance - equal sample size in all groups

blocking - grouping of units that share properties (like location); within each block, treatments are randomly assigned

New cards

statistical power

probability that a random sample will lead to rejection of a false null hypothesis

1 - beta

New cards

Type 1 error

rejecting the null hypothesis when the null hypothesis is true

False +

alpha

New cards

Type 2 error

failing to reject the null hypothesis when the null hypothesis is false

False -

Beta

New cards

Statistical Power is most affected by

Sample Size and Variance of Data

New cards

chi squared goodness of fit test

compares frequency data to a probability model stated by the null hypothesis

common for hypothesis testing

New cards

chi squared test statistic (x²)

sum((observation - EV)²/EV)

New cards

chi squared distrubution

only affected by k (degrees of freedom)

k = df

New cards

# categories - 1 - # parameters estimated from the data

New cards

chi squared assumptions

random and independent sampling

expected frequencies >1

no more than 20% of categories should have expected frequencies < 5

New cards

one sample t-test

compares the mean of a sample with some “null mean”

New cards

t-distribution

continuous and symmetrical. Nu controls “heavy tailed-ness”

New cards

t-distribution test statistic

(sample mean - null mean) / SE of the sample mean

New cards

T-test df

n-1 = v

New cards

Paired t-test

compares the mean difference between two sample means to a null mean

controls for variation among plots that is otherwise difficult to control for

New cards

paired t-test test statistic

(mean difference - null mean)/SE of the mean difference

New cards

Unpaired t-test

compares the difference of one samle mean from another sample mean

New cards

Unpaired t-test test statistic

(sample 1 mean - sample 2 mean)/SE of sample 1 mean - sample 2 mean

100

New cards

Unpaired t-test df

v = n1 + n2 - 2

Explore top notes

Simple Molecular Substances

Updated 1223d ago

0.0(0)

Civil Rights Movement

Updated 325d ago

0.0(0)

How to Approach the Multiple-Choice Questions

Updated 1178d ago

0.0(0)

Mixtures and Chromatography

Updated 1253d ago

0.0(0)

Chapter 12: Atom Economy and Percentage Yield

Updated 1076d ago

0.0(0)

Untitled

Updated 583d ago

0.0(0)

Chapter 8: Investigation and Evidence Collection

Updated 1223d ago

0.0(0)

servus + rex ending

Updated 147d ago

0.0(0)

Simple Molecular Substances

Updated 1223d ago

0.0(0)

Civil Rights Movement

Updated 325d ago

0.0(0)

How to Approach the Multiple-Choice Questions

Updated 1178d ago

0.0(0)

Mixtures and Chromatography

Updated 1253d ago

0.0(0)

Chapter 12: Atom Economy and Percentage Yield

Updated 1076d ago

0.0(0)

Untitled

Updated 583d ago

0.0(0)

Chapter 8: Investigation and Evidence Collection

Updated 1223d ago

0.0(0)

servus + rex ending

Updated 147d ago

0.0(0)

Explore top flashcards

2e: Nutrition (Flowering plants)

43Updated 1142d ago

0.0(0)

first bullet point: expansionist foreign policy: political, economic, social, ideological

29Updated 947d ago

0.0(0)

ap lit words

60Updated 589d ago

0.0(0)

Final exam MKT 3401

92Updated 493d ago

0.0(0)

Cross-Cultural Understanding

64Updated 482d ago

0.0(0)

OP Part A Khan Study Guide Answers

25Updated 857d ago

0.0(0)

IM Unidad 1 ASD 1 Las partes de un cuento

24Updated 204d ago

0.0(0)

Greek and Roman Mythology Midterm 1

100Updated 1147d ago

0.0(0)

2e: Nutrition (Flowering plants)

43Updated 1142d ago

0.0(0)

first bullet point: expansionist foreign policy: political, economic, social, ideological

29Updated 947d ago

0.0(0)

ap lit words

60Updated 589d ago

0.0(0)

Final exam MKT 3401

92Updated 493d ago

0.0(0)

Cross-Cultural Understanding

64Updated 482d ago

0.0(0)

OP Part A Khan Study Guide Answers

25Updated 857d ago

0.0(0)

IM Unidad 1 ASD 1 Las partes de un cuento

24Updated 204d ago

0.0(0)

Greek and Roman Mythology Midterm 1

100Updated 1147d ago

0.0(0)