Biostats

0.0(0)
studied byStudied by 0 people
0.0(0)
call with kaiCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/90

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:58 PM on 1/22/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

91 Terms

1
New cards

statistics

way to make sense of data

science that deals with collection, classification, analysis, and interpretation of numerical data

science of making decisions regarding the characteristics of observations based on information obtained from a randomly selected sample of a group

affect personal decision-making

2
New cards

probability

likelihood something will happen; outcome is uncertain

3
New cards

research involves

formulating a question of interest

designing a study

collecting data (statistics)

analyzing data (statistics)

interpreting results (statistics)

drawing conclusions (statistics)

4
New cards

anecdotal evidence

informal observations

ex. older men seem to gamble more than younger men

may be true but small samples that don’t represent entire population of interest

need formal study with statistical evidence about age and risk-taking behavior

5
New cards

good research question should state

groups of interest

response of interest

broad vs. focused question

6
New cards

population

set of all subjects of interest

7
New cards

sample

subset of population of interest on which you collect data

8
New cards

census

collecting data on everyone

9
New cards

difficultly with census

expensive

more complex than taking a sample

difficult to complete (some individuals are hard to locate)

populations are dynamic and constantly changing

10
New cards

descriptive statistics

used to summarize collected data

11
New cards

inferential statistics

used to draw conclusions about a population, based on data obtained from a sample of population

12
New cards

parameter

numerical summary of the population

we want to make inferences on parameters

true value of parameter is unknown

denoted with Greek letters

13
New cards

statistic

numerical summary of the sample

calculate from sample data

denoted with lowercase letters, bars, and hats

14
New cards

sample data and statistical inference

sample data are an approximate (imperfect) reflection of population data

sample may not match population

statistical inference describes what is likely happening in the population based on observed sample data

must understand variation in data

15
New cards

variable

any characteristic observed in a study

16
New cards

categorical

each observation belongs to one of a set of categories

contain descriptive words/phrases

17
New cards

quantitative

observations take on numeric values

18
New cards

discrete

finite number of possible values

0, 1, 2, …

19
New cards

continuous

continuum of infinitely many possible values

ex. 1:54.2, 1:54.90

20
New cards

dichotomous

only 2 categories

dead or alive

21
New cards

nominal

two or more categories, but no intrinsic/natural ordering

ex. blood type

22
New cards

ordinal

categories have a natural ordering

ex. years in college

23
New cards

distribution

possible values a variable can take on and the occurrence of those values

24
New cards

frequency

number of observations in each category

25
New cards

proportion

number of observations in each category divided by the total number of observations (relative frequency)

26
New cards

percentage

proportion multiplied by 100

27
New cards

modal cateogory

category with highest frequency

28
New cards

sample data are an approximate (imperfect) reflection of population data

sometimes what we see in a sample is not exactly how things are in the population

29
New cards

statistical inference

describing what you think is likely to be happening in the population based on your observed sample data

to do this, we need to understand variation in our data

30
New cards

data visualization: categorical variables

all data once must be sorted, cleaned, and arranged to see what is going on before we perform statistical analyses on it

31
New cards

graphs for categorical variables

pie chart

bar plot

dot plot

steam and leaf plot

histogram (summarizes quantitative data, not display exact values)

time series plot

scatterplot

32
New cards

the graph you choose depends on

type of data you have

features of the data you want to highlight

33
New cards

informal observations constitute anecdotal evidence

ex. older men seem to gamble more than younger men

anecdotal evidence may be true, but is often based on small samples that are not representative of an entire population of interest

we are unable to collect data from all men, so we need a strategy for determining who and how many men to collect data from to make conclusions about age and risk-taking behavior

34
New cards

sampling frame

list of subjects in the population from which the sample is taken

35
New cards

method

used to collect data is sampling design

36
New cards

non-random sampling methods are likely to suffer from

bias

37
New cards

sampling

process of selecting units (cases; persons, objects, events)

38
New cards

probability sampling

units selected randomly; units have a known probability of being selected

simple random sampling

stratified sampling

cluster sampling

39
New cards

non-probability sampling

units selected non-randomly

volunteer sample; convenience sample

40
New cards

simple random sampling

every member has equal probability of being selected

ex. random number generator

issue: underrepresentation of a certain group in your population

solution: draw a larger sample to ensure representation

41
New cards

stratified

divide sampling frame into strata (subpopulations) then randomly select from within each strata

42
New cards

cluster

sampling in stages

useful if target population is very large + you don’t really have a sampling frame

break group into clusters/natural groups —> randomly select group of clusters —> obtain sampling frame from each cluster —> draw random sample from each cluster

43
New cards

how do select individuals to participate in your study?

ideally, you want participants to be representative sample from your population so that your statistical inference can be generalizable to the population

44
New cards

bias

present when the results of the sample are not representative of the population

45
New cards

sampling bias (coverage bias)

result from sampling method

sample not actually random

sampling frame does not represent entire population

46
New cards

nonresponse bias

occurs when people do not participate

participants may have different characteristics than non-participants

participants may only respond to some questions, generating missing data

47
New cards

response bias

occurs when participants give inaccurate answers

participants lie or misremember

questions can be confusing/misleading

48
New cards

nonprobability sampling

sometimes probability sample is difficult, not possible, or inappropriate for public health issues

ex. homeless populations are both hard to identify and not easily accessible

issue is to enhance insight or understanding of a small or specific social unit/group

if random sampling doesn’t make sense…use nonprobability sampling methods

49
New cards

bar plot

can use either the frequency or percent on y-axis

50
New cards

dot plot

horizontal line shows range of values for the variable of interest

each dot represents an observation

dot plots show exact data

51
New cards

stem and lead plot

vertical line separates the stem from the leaf

the stem (left) shows all digits except the last one

the leaf (right) shows the last digit

steam and leaf plots show exact data values

52
New cards

histogram

useful to get a sense of the shape of the distribution of data

range of values for the variable of interest on x-axis

values are grouped into equal width intervals

frequency or relative frequency of occurrence for groups of values on y-axis

summarizes quantitative data it does not display exact values

too few intervals may not be informative or useful

too many intervals may make it too difficult to see trends

6-10 intervals is usually appropriate

53
New cards

time series plot

display data that are collected overt time

x-axis as time

y-axis as variable of interest

trends are more easily identified when we connect the points with lines

54
New cards

scatterplots

useful to explore the relationship between two continuous variables

55
New cards

experimental

participants assigned to experimental conditions; response variable/outcome of interest is then observed

experimental conditions: treatments

establish cause and effect; reduces potential for confounding variables to affect results through random assignment; typically has control and treatment group; random assignment

56
New cards

observational

researchers observe both the response and explanatory variables without assigning a “treatment”

non-experimental

cannot establish cause and effect; confounding variables can influence results; comparison group; random sampling

57
New cards

why randomly assign individuals to treatment and control groups?

allow us to make sure groups are balanced with respect to other characteristics

comparing results between groups allows us to determine if intervention was effective

these allow us to attribute any observed effects as the result of experimental assignment (rather than confounding variables); can conclude a causal effect

58
New cards

control

compare treatment of interest to control group

59
New cards

randomize

randomly assign subjects to treatment and control groups

60
New cards

replicate

collect a sufficiently large sample size or replicate the entire study

61
New cards

block

account for variables known for suspected to affect the response of interest

62
New cards

placebo

“fake” treatment, often used as control group

63
New cards

placebo effect

showing change despite being on the placebo

64
New cards

blinding

experimental units don’t know which group they are in

65
New cards

double-blind

both experimental units and researchers don’t know the group assignment

66
New cards

multifactoral experimental studies

categorical explanatory variables in experiments may be referred to as factors

sometimes it may be of interest to evaluate the effect of multiple factors

67
New cards

blocking in experimental studies

blocking creates groups (blocks) that are similar with respect to the blocking variable; then treatment is assigned

separate participants into groups by whether they already use

randomly assign thing or placebo within each block

68
New cards

factors vs. blocking

factors are conditions we can impose on the experimental units (explanatory variables; ex treatment vs. control)

blocking variables are characteristics that the experimental units come with, which we would like to control for

blocking in experimental studies is just like stratifying

69
New cards

designs of observational studies

cross-sectional

longitudinal

association does not imply causation - only examines associations

70
New cards

cross-sectional

one time point; a “snapshot”

71
New cards

longitudinal

same participants studied multiple times over time

72
New cards

confounding

occurs when a third variable is associated with both the explanatory and response variable

73
New cards

sometimes study results cannot be replicated because of

different sampling of techniques (different kinds of people are enrolled in the study)

different explanatory variables are examined

poor data management/analysis

the finding was spurious to begin with

74
New cards

mean

sum of observations divided by the number of observations

highly influenced by outlier

75
New cards

median

middle value of ordered data

when odd, median is middle value of ordered data

when even, median is the average of the two middle data points

fairly resistant to outliers

76
New cards

range

difference between the largest and smallest observations

range = max - min

severely affected by outliers

77
New cards

standard deviation

represents a type of average distance of an observation from the mean

quantifies variability observed in the data

has same unit of measurement as the original data

s is undefined; when all observations take on the same value, the s is 0

the larger the standard deviation, the greater the variability in the data

not resistant to outliers

78
New cards

variance

square of standard deviation

units of measurement for the variance is in the square units of measurement for the original data

we report the standard deviation more frequently than the variance in summary statistics

79
New cards

histograms can be used to understand the

distribution of data and describe the overall pattern

80
New cards

unimodal

one peak

81
New cards

symmetric

mirror image when folded in half

mean and median are approximately equal

mean is an appropriate measure of central tendency

82
New cards

bell-shaped

follows a bell-shaped curve

83
New cards

bimodal

two peaks

84
New cards

left-skewed

left tail is longer than the right (skew is in the direction of the tai)

mean is less than the median

mean pulled in the direction of the long left tail

in highly skewed distribution, the median is preferred over the mean as a measure of central tendency (it better represents what is typical)

85
New cards

right-skewed

right tail is longer than the left (skew is in the direction of the tail)

mean is greater than the median

mean is pulled in the direction of the long right tail

in highly skewed distributions, the median is preferred over the mean as a measure of central tendency (it better represents what is typical)

86
New cards

uniform distribution

all values seem approximately equally likely

87
New cards

percentiles

value such that p percent of the observations fall below or at that value

median - 50th percentile

first quartile - 25th

third quartile - 75th percentile

88
New cards

finding quartiles

  1. order your data

  2. identify middle of the data

  3. examine the lower half of the data defined by the median. The median of the lower half is the 25th percentile (first quartile)

  4. examine the upper half of the data defined by the median. The median of the upper half is the 75th percentile (third quartile)

89
New cards

interquartile range

data and distance between third and first quartiles

IQR = Q3 -Q1

resistant to outliers

range of middle half of the data

90
New cards

potential outlier

below Q1 - 1.5 x IQR or above Q3 + 1.5 x IQR

possible for an observation to fall outside of these bounds and not truly be an outlier

91
New cards

five number summary

minimum value, Q1, median, Q3, maximum value

displayed in boxplot

whiskers extend out to the smallest and largest observations that are not potential outliers (indicated with circles)