Biostats Unit 1

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/78

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

79 Terms

New cards

Inferential statistics

reach decisions about a large body of data if all we have is a small part

New cards

Data

raw material of statistics

New cards

Statistics

a field of study concerned with:

1) Collection, organization, summarization, & analysis of data

2) the drawing of inferences about a body of data when only a part of the data is observed

3) interpret & communicate results

an estimated value calculated from a sample, varies from sample to sample

New cards

Where do we get data?

1) Routinely kept records

2) Surveys

3) Experiment

4) External sources

New cards

Variable

If a characteristic takes on different values in different persons, places, things

New cards

Quantitative variable

One that can be measured in the usual way & has units. Conveys info about an amount (does it make sense to find the avg?)

Best graphs:
✅ Histogram (especially for large data sets)
✅ Box-and-whisker (for comparing medians and outliers)
✅ Dotplot (good for small datasets)
✅ Stem-and-leaf (also good for small to medium datasets)

New cards

Qualitative variable

one that has categories

-we count the # in each category, called the frequency

exp: eye color, ethinic group, medical diagnoses

-Bar graph best for this type of data

New cards

Random variable

when the values arise as a result of chance factors & so can’t be predicted

New cards

Observations/measurements

Values from the measurement procedure

New cards

Discrete variable

characterized by gaps or interruptions in the values that it can assume

exp: coin flip (heads or tails)

-# of pts seen each day

New cards

Continuous random variable

-doesn’t posses any gaps or interruptions

-it can assume any value within a specified relevant interval of values

-e.g: weight, height, wrist circumference

New cards

Population

the collection of all entities for which we have an interest

-entities=cases=units

New cards

Population of values

-measurement of a variable on each of the entities we generate

New cards

size of the finite population (fixed # if values/entities)

New cards

Infinite population

if there are an endless # of values

New cards

Parameter (fixed #)

an exact value calculated from a pop

New cards

Sample

a part of the pop

-we use n=size of sample

New cards

Measurement

an assignment of #s to objects or events according to a set of rules

-various scales result from the different rules

New cards

Nominal scale

Definition: Categorizes data without any order or ranking.
Examples: Gender, blood type, types of bacteria, colors.
How to Identify: Look for data that are names, labels, or categories without a meaningful sequence.

<ul><li><p><strong>Definition:</strong> Categorizes data without any order or ranking.</p></li><li><p><strong>Examples:</strong> Gender, blood type, types of bacteria, colors.</p></li><li><p><strong>How to Identify:</strong> Look for data that are names, labels, or categories without a meaningful sequence.</p></li></ul><p></p>

New cards

Mutually exclusive

in 1 & only 1 category or group

New cards

Collectively exhaustive

all possible categories are listed

New cards

Ordinal scale

Definition: Categorizes data with a meaningful order, but the intervals between categories are not equal.
Examples: Pain severity (mild, moderate, severe), class rank, stages of cancer.
How to Identify: Look for rankings or ordered categories where the difference between ranks isn’t uniform.

<ul><li><p><strong>Definition:</strong> Categorizes data with a meaningful order, but the intervals between categories are not equal.</p></li><li><p><strong>Examples:</strong> Pain severity (mild, moderate, severe), class rank, stages of cancer.</p></li><li><p><strong>How to Identify:</strong> Look for rankings or ordered categories where the difference between ranks isn’t uniform.</p></li></ul><p></p>

New cards

Interval scale

Definition: Numerical data with equal intervals between values but no true zero point.
Examples: Temperature in Celsius or Fahrenheit, dates on a calendar.
How to Identify: Check for numeric data where subtraction makes sense, but there’s no absolute zero (e.g., 0°C doesn’t mean “no temperature”).

<ul><li><p><strong>Definition:</strong> Numerical data with equal intervals between values but no true zero point.</p></li><li><p><strong>Examples:</strong> Temperature in Celsius or Fahrenheit, dates on a calendar.</p></li><li><p><strong>How to Identify:</strong> Check for numeric data where subtraction makes sense, but there’s no absolute zero (e.g., 0°C doesn’t mean “no temperature”).</p></li></ul><p></p>

New cards

Ratio scale

Definition: Numerical data with equal intervals and a true zero, allowing for meaningful ratios.
Examples: Height, weight, age, enzyme activity, heart rate.
How to Identify: Look for numeric data with a true zero (e.g., 0 means none of the quantity), and ratios make sense (e.g., 20 kg is twice as heavy as 10 kg).

<ul><li><p><strong>Definition:</strong> Numerical data with equal intervals <strong>and</strong> a true zero, allowing for meaningful ratios.</p></li><li><p><strong>Examples:</strong> Height, weight, age, enzyme activity, heart rate.</p></li><li><p><strong>How to Identify:</strong> Look for numeric data with a true zero (e.g., 0 means none of the quantity), and ratios make sense (e.g., 20 kg is twice as heavy as 10 kg).</p></li></ul><p></p>

New cards

Simple random sample

If a sample of size n is drawn from a pop of size N in such a way that every possible sample of size n has the same chance of being selected

New cards

Sampling with replacement

every member of the pop is available at draw

-every member could be selected again (& again)

New cards

Sampling w/o replacement

we would not record the value of the variable for any member that has already been selected

-usually do this

New cards

Research study

a scientific study of a phenomenon of interest. Involves designing the sampling protocols, collecting & analyzing the data, & providing valid conclusions based on the analysis

New cards

Experiment

a special type of research study in which observations are made after specific manipulations of conditions have been done

New cards

Systematic sampling

used in health sciences calculate total number of records needed for study

-use a random # as starting point —> call it record x

-pick a second, calculated by # of records desired, to get the sampling interval, use K

-Data in sample: x, x + k, x +2k, etc.,

New cards

Scientific method

a process by which scientific info is collected, analyzed & reported in order to produce unbiased & replicable results in a effort to provide an accurate representation of the observable phenomena

New cards

Ordered array

A listing of the values of a collection (set) in order of magnitude from smallest to largest (in population or sample)

New cards

Relative frequency of occurrence

proportion of values falling in each class

New cards

Measure of central tendency

a single value that is considered typical of the set of data, i.e, the “average” value

There are 3: mean (avg), median, mode

New cards

(Useful) properties of mean

1) Uniqueness: for a given set of data, there is 1 & only 1 mean

2) Simplicity: easy to calculate & understand

3) Since each & every data value is used (good), however extreme values can have an undue influence on the mean

New cards

Skewed

If the graph (histogram or frequency polygon) of a distribution is asymmetric, the distribution is said to be

New cards

Positively skewed/skewed to the right

If a distribution is not symmetric because it has a long tail to the right, we say that the distribution is

-if the mean is greater than the mode

-the mean is larger than the median because the extreme high values pull the mean to the right, while the median remains near the center of the data.

New cards

Negatively skewed/skewed to the left

If a distribution is not symmetric because its graph extends further to the left than to the right, that is, if it has a long tail to the left, we say that the distribution is

-mean is less than mode

<p> If a distribution is not symmetric because its graph extends further to the left than to the right, that is, if it has a long tail to the left, we say that the distribution is</p><p>-mean is less than mode</p>

New cards

Dispersion

refers to the variety that they exhibit

-conveys information regarding the amount of variability present in a set of data. If all the values are the same, there is none; if they are not all the same, it is present in the data

-values are widely scattered—> it is greater

<p>refers to the variety that they exhibit</p><p>-conveys information regarding the amount of variability present in a set of data. If all the values are the same, there is none; if they are not all the same, it is present in the data</p><p>-values are widely scattered—> it is greater</p>

New cards

Interquartile range (IQR)

the difference between the third and first quartiles

-Q3-Q1

A large value indicates a large amount of variability among the middle 50% of the relevant observations, and a small value indicates a small amount of variability among the relevant observations

New cards

Kurtosis

a measure of the degree to which a distribution is “peaked” or flat in comparison to a normal distribution whose graph is characterized by a bell-shaped appearance.

<p>a measure of the degree to which a distribution is “peaked” or flat in comparison to a normal distribution whose graph is characterized by a bell-shaped appearance.</p>

New cards

Outlier

a data point that differs significantly from other observations

New cards

Objective probability

comes from objective processes

2 divisions: classical or relative frequency

New cards

Classical probability

-(a priorirprobability) is based on counting theory & the idea that each event are equally likely to occur

New cards

Relative frequency probability

( a posteriori or experimental)

is based on the repeatability of same process & the ability to count the # of repetitions & the # of times the event of interest occurs

New cards

Subjective probability

holds that probability measures the confidence that an individual has in the truth of a particular proposition. Does not need repeatability, can be used for an event that might only happen once

-We use Bayesian methods

<p>holds that probability measures the confidence that an individual has in the truth of a particular proposition. Does not need repeatability, can be used for an event that might only happen once</p><p>-We use Bayesian methods</p>

New cards

Prior probability

of an event is probability based on prior knowledge, prior experience, or results from prior data collection

New cards

Posterior probability

of an event is probability obtained by using new information to update or revised prior probability

New cards

Conditional probability

When probabilities are calculated with a subset of the total group as the denominator

New cards

Joint probability

Sometimes we want to find the probability that a subject picked at random from a group of subjects possesses two characteristics at the same time.

<p>Sometimes we want to find the probability that a subject picked at random from a group of subjects possesses two characteristics at the same time. </p>

New cards

Descriptive statistics

provide simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs.

<p><strong>provide simple summaries about the sample and about the observations that have been made</strong><span>. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs.</span></p>

New cards

Biostatistics

a field of study that uses statistical methods to analyze data related to living organisms.

New cards

Stratified sampling

researchers divide subjects into subgroups called strata based on characteristics that they share (e.g., race, gender, educational attainment). Once divided, each subgroup is randomly sampled using another probability sampling method.

Identify Strata (Groups):
- Group students by class standing: Freshmen, Sophomores, Juniors, Seniors.
Determine the Population Size (N):
- N=100N = 100N=100 students.
Decide the Sample Size (n):
- n=20n = 20n=20 students.
Determine the Proportion from Each Stratum:
- Freshmen: 40/100×20=8 students.
- Sophomores: 30/100×20=6 students.
- Juniors: 20/100×20=4 students.
- Seniors: 10/100×20=2 students.
Randomly Select Students from Each Stratum:
- Randomly select 8 Freshmen.
- Randomly select 6 Sophomores.
- Randomly select 4 Juniors.
- Randomly select 2 Seniors.

<p><strong>researchers divide subjects into subgroups called strata based on characteristics that they share</strong><span> (e.g., race, gender, educational attainment). Once divided, each subgroup is randomly sampled using another probability sampling method.</span></p><ol><li><p><strong>Identify Strata (Groups):</strong></p><ul><li><p>Group students by class standing: <strong>Freshmen, Sophomores, Juniors, Seniors</strong>.</p></li></ul></li><li><p><strong>Determine the Population Size (N):</strong></p><ul><li><p>N=100N = 100N=100 students.</p></li></ul></li><li><p><strong>Decide the Sample Size (n):</strong></p><ul><li><p>n=20n = 20n=20 students.</p></li></ul></li><li><p><strong>Determine the Proportion from Each Stratum:</strong></p><ul><li><p>Freshmen: 40/100×20=8 students.</p></li><li><p>Sophomores: 30/100×20=6 students.</p></li><li><p>Juniors: 20/100×20=4 students.</p></li><li><p>Seniors: 10/100×20=2 students.</p></li></ul></li><li><p><strong>Randomly Select Students from Each Stratum:</strong></p><ul><li><p>Randomly select <strong>8 Freshmen</strong>.</p></li><li><p>Randomly select <strong>6 Sophomores</strong>.</p></li><li><p>Randomly select <strong>4 Juniors</strong>.</p></li><li><p>Randomly select <strong>2 Seniors</strong>.</p></li></ul></li></ol><p></p>

New cards

Stem-and-leaf display

a method of visualizing data by splitting each data point into two parts: a "stem" representing the leading digits and a "leaf" representing the last digit

<p><span>a method of visualizing data by splitting each data point into two parts: a "stem" representing the leading digits and a "leaf" representing the last digit</span></p>

New cards

Box and whisker plot

a visual representation of data distribution that displays the minimum, first quartile, median, third quartile, and maximum values of a dataset using a box with lines extending outwards ("whiskers") to show the spread of the data, including potential outliers

<p><span>a visual representation of data distribution that displays the minimum, first quartile, median, third quartile, and maximum values of a dataset using a box with lines extending outwards ("whiskers") to show the spread of the data, including potential outliers</span></p>

New cards

Location parameter

a statistical measurement that indicates the center of a distribution

Common ones:

Mean: The most common location parameter, often represented by the symbol μ
Median: The value in the middle of ordered values
Mode: The most common value in a distribution

New cards

Frequency distribution

a graph or table that shows how often a value occurs in a data set.

New cards

Coefficient of variation

the ratio of the standard deviation to the mean. The higher the value, the greater the level of dispersion around the mean. It is generally expressed as a percentage.

<p><strong>the ratio of the standard deviation to the mean</strong>. The higher the value, the greater the level of dispersion around the mean. It is generally expressed as a percentage.</p>

New cards

Box and Whisker outlier boundaries

Lower Bound: Q1−1.5×IQR
-anything below this number
Upper Bound: Q3+1.5×IQR
-anything above this number

<ul><li><p><strong>Lower Bound</strong>: Q1−1.5×IQR</p><p>-anything below this number</p></li><li><p><strong>Upper Bound</strong>: Q3+1.5×IQR</p><p>-anything above this number</p></li></ul><p></p>

New cards

Sensitivity

a measure of how well a test can identify people who have a condition or outcome

=P(Test Positive | Has Disease)

= Number who tested positive and have disease / Total number who have disease

New cards

Specificity

a statistical measure of how well a test can identify people who do not have a condition

P(Test Negative | No Disease)
= Number who tested negative and don’t have disease / Total number who don’t have disease

New cards

Predictive value positive

a statistical measure that indicates the probability of a positive test result correctly identifying a disease or condition

New cards

Predictive value negative

the probability that someone with a negative test result is actually free of disease

New cards

Bayes’ theorem

a mathematical formula that calculates the probability of an event based on prior knowledge

New cards

Complement probability

the probability that the event does not occur

<p>the probability that the event does <strong>not</strong> occur</p>

New cards

Addition rule

to find the probability of either event A or event B occurring, you add the individual probabilities of each event, then subtract the probability of both events occurring simultaneously

<p><span>to find the probability of either event A or event B occurring, you add the individual probabilities of each event, then subtract the probability of both events occurring simultaneously</span></p>

New cards

Multiplication rule

the probability of two (or more) independent or dependent events occurring together can be found by multiplying their individual probabilities.

Quick Test for Independence:

Calculate P(A)×P(B).
Compare it to P(A and B)
If they are equal, the events are independent.
If they are not equal, the events are dependent.

<p>the probability of two (or more) independent or dependent events occurring together can be found by multiplying their individual probabilities.</p><p></p><p><strong>Quick Test for Independence:</strong></p><ul><li><p>Calculate P(A)×P(B).</p></li><li><p>Compare it to P(A and B)</p></li><li><p>If they are equal, the events are independent.</p></li><li><p>If they are not equal, the events are dependent.</p></li></ul><p></p>

New cards

Class interval widths

1. Identify the Range of the Data

The range is the difference between the maximum and minimum values in your data set

2. Choose the Number of Classes

The number of classes (k) can be chosen based on guidelines such as Sturges' Rule:

k=1+3.322log⁡ n where n is the number of data points.

Alternatively, you can choose a reasonable number of classes (usually between 5 to 20, depending on the data size).

3. Calculate the Class Width

The class width (w) is calculated as:

w=Range/number of classes

Round up the class width to a convenient number for easy interpretation.
If you get a decimal, round up to the next whole number to ensure full coverage.

<p><strong>1. Identify the Range of the Data</strong></p><p>The range is the difference between the <strong>maximum</strong> and <strong>minimum</strong> values in your data set</p><p><strong>2. Choose the Number of Classes</strong></p><p>The number of classes (k) can be chosen based on guidelines such as <strong>Sturges' Rule</strong>:</p><p>k=1+3.322log⁡ n where n is the number of data points.</p><p>Alternatively, you can choose a reasonable number of classes (usually between <strong>5 to 20</strong>, depending on the data size).</p><p><strong>3. Calculate the Class Width</strong></p><p>The class width (w) is calculated as:</p><p>w=Range/number of classes</p><ul><li><p>Round up the class width to a convenient number for easy interpretation.</p></li><li><p>If you get a decimal, round <strong>up</strong> to the next whole number to ensure full coverage.</p></li></ul><p></p>

New cards

Measurement scales

1. Nominal Scale

Definition: Categorizes data without any order.
Characteristics:
- Names, labels, or categories only
- No ranking or meaningful order
- Can't do arithmetic on them
Examples:
- Gender (male, female, other)
- Blood type (A, B, AB, O)
- Eye color (blue, brown, green)

🔸 2. Ordinal Scale

Definition: Categorizes data with a meaningful order, but the differences between categories are not measurable or consistent.
Characteristics:
- Ranked or ordered
- Intervals are not equal or meaningful
- Arithmetic is not valid
Examples:
- Survey ratings (satisfied, neutral, dissatisfied)
- Education level (high school, bachelor's, master's)
- Class rankings (1st, 2nd, 3rd)

🔹 3. Interval Scale

Definition: Numerical data with meaningful differences between values, but no true zero point.
Characteristics:
- Ordered and equally spaced
- No absolute zero (zero does not mean "none")
- Ratios don’t make sense
Examples:
- Temperature in °C or °F (0° ≠ "no temperature")
- IQ scores
- Dates on a calendar

🔸 4. Ratio Scale

Definition: Like the interval scale, but has a true zero, so ratios are meaningful.
Characteristics:
- Ordered, equal intervals, and a true zero point
- All arithmetic operations are valid
Examples:
- Age
- Height
- Weight
- Distance
- Time duration
- Money

<p><strong>1. Nominal Scale</strong></p><ul><li><p><strong>Definition:</strong> Categorizes data <strong>without any order</strong>.</p></li><li><p><strong>Characteristics:</strong></p><ul><li><p>Names, labels, or categories only</p></li><li><p>No ranking or meaningful order</p></li><li><p>Can't do arithmetic on them</p></li></ul></li><li><p><strong>Examples:</strong></p><ul><li><p>Gender (male, female, other)</p></li><li><p>Blood type (A, B, AB, O)</p></li><li><p>Eye color (blue, brown, green)</p></li></ul></li></ul><p></p><p><span data-name="small_orange_diamond" data-type="emoji">🔸</span> <strong>2. Ordinal Scale</strong></p><ul><li><p><strong>Definition:</strong> Categorizes data <strong>with a meaningful order</strong>, but the differences between categories are <strong>not measurable</strong> or <strong>consistent</strong>.</p></li><li><p><strong>Characteristics:</strong></p><ul><li><p>Ranked or ordered</p></li><li><p>Intervals are not equal or meaningful</p></li><li><p>Arithmetic is not valid</p></li></ul></li><li><p><strong>Examples:</strong></p><ul><li><p>Survey ratings (satisfied, neutral, dissatisfied)</p></li><li><p>Education level (high school, bachelor's, master's)</p></li><li><p>Class rankings (1st, 2nd, 3rd)</p></li></ul></li></ul><p></p><p><span data-name="small_blue_diamond" data-type="emoji">🔹</span> <strong>3. Interval Scale</strong></p><ul><li><p><strong>Definition:</strong> Numerical data with meaningful differences between values, <strong>but no true zero point</strong>.</p></li><li><p><strong>Characteristics:</strong></p><ul><li><p>Ordered and equally spaced</p></li><li><p>No absolute zero (zero does <strong>not</strong> mean "none")</p></li><li><p>Ratios don’t make sense</p></li></ul></li><li><p><strong>Examples:</strong></p><ul><li><p>Temperature in °C or °F (0° ≠ "no temperature")</p></li><li><p>IQ scores</p></li><li><p>Dates on a calendar</p></li></ul></li></ul><p></p><p><span data-name="small_orange_diamond" data-type="emoji">🔸</span> <strong>4. Ratio Scale</strong></p><ul><li><p><strong>Definition:</strong> Like the interval scale, but <strong>has a true zero</strong>, so ratios are meaningful.</p></li><li><p><strong>Characteristics:</strong></p><ul><li><p>Ordered, equal intervals, <strong>and</strong> a true zero point</p></li><li><p>All arithmetic operations are valid</p></li></ul></li><li><p><strong>Examples:</strong></p><ul><li><p>Age</p></li><li><p>Height</p></li><li><p>Weight</p></li><li><p>Distance</p></li><li><p>Time duration</p></li><li><p>Money</p></li></ul></li></ul><p></p>

New cards

Cumulative frequency

New cards

Sturges rule

-an equation to help one choose the number of bins in a histogram or frequency distribution which helps organize the data into a clear representation without it being too broad or too restricted

<p>-an equation to help one choose the number of bins in a histogram or frequency distribution which helps organize the data into a clear representation without it being too broad or too restricted </p>

New cards

Which graphs provide a smooth vision of the data?

Histogram & Box and whisker

New cards

Which graph best shows interval spacings between data points?

dot plots

New cards

Which graph(s) can handle a large amount of data over a large range of values?

-Stem & Leaf

-Histogram

-Box & whisker

New cards

Median position for odd sample sizes

New cards

P(AUB) (“or” probability)

= P(A) + P(B)- P(ANB)

New cards

Bell shaped graph

New cards

Uniform histogram

New cards

False positive

= P(Test Positive | No Disease)
= Number who tested positive but don’t have disease / Total number who don’t have disease