HUBS 191 Module 2: Biostatistics

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/77

There's no tags or description

Looks like no tags are added yet.

Last updated 3:10 AM on 3/25/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

78 Terms

New cards

Sample standard deviation

Measure of spread or variability of a sample data set. The average distance of each data point.

New cards

Continuous numeric

Any value within a given range, includes fractions and decimals

New cards

Ordinal categorical

Represents categories with order and rank, such as ‘low,’ ‘medium,’ and ‘high’

New cards

Nominal categorical

No inherit order, groups items into categories without ranking

New cards

Continuous categorical

Grouping continuous (measurable) variables into distinct categories

New cards

Binary data

Categorises information/data into only two possible categories

New cards

Expected value

Average of all possible outcomes for a random variable, multiply each outcome by its probability and summing the results

New cards

Variance

How spread out a set of data is from the mean value. It is calculated as the average of the squared differences from the mean.

New cards

Random variables

Assigns a numerical variable to each outcome of a random phenomenon

New cards

Pr(X)

Probability someone/something is X (independent)

New cards

Pr(Y)

Probability someone/something is Y (dependent)

New cards

Pr(Y|X)

Probability someone/something is Y given they’re X

New cards

Continuous random variable

Variable that can take on any value within a given range

New cards

Standard error

The standard deviation of a sample’s distribution, used to measure how much a sample statistic (like the mean) id likely to differ from the true population parameter

New cards

Sampling distribution

Probability distribution of a statistic (like the mean or standard deviation) that is calculated from multiple, repeated random samples taken from the same population.

New cards

t-distribution

For small sample sizes or when the population standard deviation is unknown

New cards

F-distribution

Probability distribution of a ratio of two independent chi-square distributed random variables divided by their degrees of freedom. commonly used in ANOVA tests and linear regression modelling.

New cards

X²-distribution (chi-squared distribution)

With degrees of freedom is the distribution of a sum of the squares of independent standard normal variables

New cards

Z-distribution

A special normal distribution where the mean is 0 and the SD is 1

New cards

Z-score

Indicates how many standard deviations a data point is from the mean in a Z-distribution.

New cards

T-score

Standardises data allowing for comparisons between different datasets. Used to find the upper and lower bounds of a CI for a population mean, especially with small sample sizes. A smaller t-score suggests a greater difference between the sample mean and the population mean, while a smaller t-score indicates more similarity.

New cards

P-norm

Calculates the cumulative distribution function (CDF) for the normal distribution, CDF whose value is the probability that a corresponding continuous variable has a value less than or equal to the argument of the function. (R-function)

New cards

D-norm

Calculates the probability density function distribution, indicating how likely it is to observe a specific outcome. Returns the height of the normal curve at a specific point which represents the relative likelihood of that value occurring, but not the direct probability (R-function)

New cards

Q-norm

When given an area, it finds the boundary value that determines that area (R-function)

New cards

R-norm

Generates a random vector of numbers with a normal distribution (R-function)

New cards

Correlation

Calculates correlation coefficients between variables, calculates various types of correlation including Pearson, Spearman, and Kendall, and can be applied to individual vectors or entire data frames to generate correlation matrices (R-function)

New cards

Prop.test

For testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain values (R-function)

New cards

Used to fit linear models. Can be used to carry out regression, single stratum analysis of variance and analysis of covariance (R-function)

New cards

Summary

Produces results summary (R-function)

New cards

chisq.test

Conducts Chi-squared tests for independence and goodness of fit (R-function).

New cards

Predict

Used to generate predictions based on fitted models, estimating outcomes for new data. (R-function)

New cards

fit

Used to create statistical models that best describe a dataset. (R-function)

New cards

Tukey HSD

Create a set of confidence intervals on the differences between the means of the levels of a factor with the specified family-wise probability of coverage. (R-function)

New cards

y-bar

The sample mean, represents the average value of a set of observations in a dataset.

New cards

x-bar

The sample mean representing the average of the independent variable in a dataset.

New cards

p-hat

The sample proportion, representing the estimated probability of a certain outcome in a dataset.

New cards

Null hypothesis

A statement that there is no effect or no difference, used as a starting point for statistical analysis and hypothesis testing.

New cards

Alternate hypothesis

A statement that indicates the presence of an effect or a difference, opposing the null hypothesis in statistical testing.

New cards

P-value

the probability of obtaining results at least as extreme as the ones observed, assuming the null hypothesis is true

New cards

μd

represents the mean difference between paired observations in hypothesis testing.

New cards

Normal model

A statistical model that represents data distributions characterized by a bell-shaped curve, defined by its mean and standard deviation.

New cards

chi-squared model test

A statistical method used to determine if there is a significant association between categorical variables by comparing observed frequencies to expected frequencies.

New cards

Linear regression model

Used to model the relationship between two variables and estimate the value of a response by using a line-of-best-fit

New cards

Binomial model

A statistical model that describes the number of successes in a fixed number of independent Bernoulli trials, used for binary outcome predictions.

New cards

Bernoulli trial

A random experiment with exactly two possible outcomes, commonly referred to as success and failure. Each trial is independent, and the probability of success remains constant.

New cards

Hypothesis test

A statistical method used to determine whether there is enough evidence to reject a null hypothesis in favor of an alternative hypothesis.

New cards

True difference

The actual difference in outcomes between two groups being compared in a statistical hypothesis test.

New cards

β₁

slope coefficient in a simple linear regression model, representing the average change in the dependent variable (Y) for a one-unit increase in the independent variable (X)

New cards

β₀

intercept term in a regression model, representing the average value of the dependent variable (Y) when all independent variables (X) are zero

New cards

β̂₀

refers totheestimated y-interceptof a regression line, based on a sample of data

New cards

β̂₁

the estimated slope from the sample data

New cards

β̂₂

the estimated coefficient for the second predictor variable in a multiple regression model.

New cards

β₂

true regression coefficient for the second predictor variable in the population regression model.

New cards

Appropriate multiplier

used to adjust estimates based on sample size or variability in statistical analysis.

New cards

R²

represents the proportion of variance in the dependent variable that can be explained by the independent variables in a regression model.

New cards

Non-constant variance

a value or factor that changes rather than staying fixed

New cards

Independence assumption

is the assumption that the observations are statistically independent of each other in a given analysis or model.

New cards

dependence assumption

is the assumption that the observations are correlated or related in some way, and thus may influence each other in statistical analysis.

New cards

linearity assumption

is the assumption that the relationship between the independent and dependent variables can be accurately modeled as a straight line.

New cards

F-value

Used in ANOVA and regression analysis to check whether your model explains a significant amount of variation in the outcome variable Y

New cards

ANOVA

a statistical test used to determine if there are significant differences between the means of three or more groups

New cards

X²-test

a statistical hypothesis test used to determine if there is a significant difference between observed and expected frequencies

New cards

Selection Bias

a systematic error that occurs when a sample used for analysis does not accurately represent the population it is intended to study, leading to skewed and unreliable conclusions

New cards

Information bias

a systematic error resulting from inaccurate measurement or reporting of data, which can lead to distorted outcomes and conclusions.

New cards

Research studies

are investigations designed to test hypotheses, analyze data, and derive conclusions about health-related questions.

New cards

CARE

refers to the process of ensuring that health interventions are effectively implemented and sustained in practice, focusing on patient-centered outcomes and quality of care.

New cards

Type I error

occurs when a null hypothesis is incorrectly rejected, suggesting a false positive result.

New cards

Type II error

occurs when a null hypothesis is incorrectly accepted, indicating a false negative result.

New cards

Population parameters

are numerical characteristics or measures of a population, such as means or variances, that summarize the data.

New cards

Strata/stratum

are distinct subgroups within a population, often used in stratified sampling to ensure representation across different segments.

New cards

Var[Y]

represents the variance of the random variable Y, indicating the spread or dispersion of its possible values.

New cards

Sample

a subset of individuals or observations selected from a larger population to make inferences about the whole group

New cards

Population

the entire, well-defined set of individuals, organisms, or data points (e.g., all patients with a specific disease, all hospitals in a region) that a researcher aims to study and make inferences about

New cards

Sampling

the process of selecting a representative subset of individuals from a larger population to make inferences about health-related characteristics

New cards

Continuous variables

Can take on any value, e.g. Height, Weight, Age, Blood pressure, often interested in the mean or average value

New cards

Investigating samples

Summarise in tables and graphs, if categorical data we can present proportions or percent, if continuous usually want to know where the centre is (central tendency) and how spread out the data is

New cards

Errors

Two different sorts

1) Errors that make our answers more uncertain i.e. more variability (can’t be avoided)

2) Errors that move us away from the truth (important to avoid)

New cards

95% confidence interval

A range of values, calculated from sample data, that is likely to contain the true population parameter (such as the mean) 95% of the time if the study were repeated

Calculation: Sample Mean ± (Critical Value × Standard Error)