1/48
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Dichotomous Graphical Summary
Bar chart
Dichotomous Numerical Summary
Frequency table
Categorical Graphical Summary
Bar chart
Categorical Numerical Summary
Frequency Table
Ordinal Graphical Summary
Histogram
Ordinal Numerical Summary
Frequency table + cumulative frequency
Discrete Graphical Summary
Boxplot
Discrete Numerical Summary
Frequency table + cumulative frequency
Continuous Graphical Summary
Boxplot
Continuous Numerical Summary
SD, mean, variance, median, IQR, mode
SD, Mean, Variance, Range
More affected by outliers
Role of Probability
Probabilities are numbers that reflect the likelihood that a particular event occurs
Statistical inference involves making generalizations or inferences about unknown population parameters based on sample statistics
A population parameter is any summary measure computed on a population (e.g., the population mean, which is denoted as μ; the population variance, which is denoted σ2)
In General, (Role Of Probability)
Select a sample from the population of interest
Measure the characteristic under study
Summarize this characteristic in our sample
Make inferences about the population based on what we observe in the sample
Probability Basics
Probability reflects the likelihood that an outcome will occur
0 ≤ probability ≤ 1
Probability of 0 means no chance that a particular event will occur
Probability of 1 indicates that an event is certain to occur
Sampling
Population size = N, sample size = n
When we select a sample from a population, we want that sample to be representative of the population
Two Main Types of Sampling
Probability sampling: each member of the population has a known probability of being selected
Non-probability sampling: each member of the population is selected without the use of probability
Probability sampling: each member of the population has a known probability of being selected
If we select subjects at random (e.g. by simple random sampling), then each subject has the same probability of being selected. This means each subject is equally likely to be selected.
Probability Sampling
Simple random sampling
Systematic sampling
Stratified sample
Cluster sampling
Multistage sampling
Simple random sampling
Need to build a sampling frame
Select n individuals at random (each has the same probability = 1/N of being selected)
Most useful with small population
Need to build a sampling frame
A complete list or enumeration of all members of population N
Systematic sampling
Start with sampling frame; determine sampling interval (N/n); select first person at random from first (N/n) thereafter
Stratified sample
Organize population into mutually exclusive groups or strata
Organize population into mutually exclusive groups or strata
These groups are different from each other (e.g. demographic groups)
Individuals within each group are similar to each other
Select individuals at random within each stratum
Cluster sampling
When clusters exist which are very similar
When clusters exist which are very similar
Groups are similar to each other (natural groups, e.g. neighborhoods, zip code)
Within each group subjects may be quite different
Then, we sample everyone from specific clusters
Multistage sampling
Combine types of sampling techniques
Non-Probability Sampling
Used in practice because sometimes not possible to generate a sampling frame
Convenience sampling
Quota sampling
Convenience sampling
Non-probability sample (not for inference)
For preliminary data
Not representative
Quota sampling
Select a predetermined number of individuals into sample from groups of interest
Similar to stratified sampling, groups are non-overlapping and different; but quota doesn’t have to represent the population, and use convenience sampling when selecting samples from each group
Sampling variability
Inferences about a large number of individuals in a population based on a study of only a small fraction of the population (i.e., the sample)
If a study is replicated or repeated on another sample from the population, it is possible that we might observe slightly different results (slightly different sample)
From any given population, there are many different samples that can be selected. The results based on each sample can vary, and this variability is called sample variability
When we make estimates about population parameters based on sample statistics, it is extremely important to quantify the precision in our estimates
The probability distribution of a statistic produced by repeatedly selecting samples of the same size and computing the desired statistic is called the sampling distribution, e.g., sampling distribution of the sample mean
Conditional probability
Probability of outcome in a specific subpopulation or subsample
Sensitivity and specificity
Screening tests are not used to make medical diagnoses but instead to identify individuals most likely to have a certain condition.
Some examples are PSA test for prostate cancer, mammograms for breast cancer, and serum and ultrasound assessments for prenatal diagnosis.
Evaluate the performance of the screening test (with dichotomous results)
Test comes back as one of two responses
But test classification can be different from the truth
Test comes back as one of two responses
Positive (+), i.e., according to the test, you have the disease
Negative (-), i.e., according to the test, you do not have the disease
But test classification can be different from the truth
You actually have the disease (+)
You actually don’t have the disease (-)
Sensitivity
True positive fraction
Probability that a diseased person screens positive = P (screen + | disease)
Ability of test to correctly identify those with disease
Specificity
True negative fraction
Probability that a disease-free person screens negative = P (screen - | disease free)
Ability of test to correctly identify those without disease
Positive; disease
True positive (TP) = have disease and test positive
Positive; no disease
False positive (FP) = do not have disease but test positive
Negative; disease
False negative (FN) = have disease but test negative
Negative; no disease
True negative (TN) = do not have disease and test negative
Sensitivity =
TP / TP + FN
Specificity =
TN / FP + TN
P (disease | screen positive) asks
What is the probability that I have the disease if my screening test comes back positive?
P (disease | screen positive) =
Positive predicted value (PPV)
P (disease free | screen negative) =
Negative predicted value (NPV)
Positive Predictive Value
TP / TP + FP
Negative Predictive Value
TN / FN + TN
Independence
Two events, A and B, are independent if P (A | B) = P (A) or if P (B | A) = P (B)