1/75
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Truth
Something we can never know with certainty
Elements of a good scientific research question
Clarifies your primary goal
Identifies the scope of inference
Interesting to others
Can be answered with data
Grounded in theory
Feasible
Goals of scientific research
Description: numerically summarize a quantity of interest
Prediction: forecast some future event based on historical or current data
Explanation: understand whether a particular variable is cause of some other variable
Explanatory variable
variable inducing a causal effect
Response variable
variable receiving causal effects
Quantitative
numeric: continuous or discrete
Qualitative
categorical, factor: nominal or ordinal
Continuous variable
scale down to any number of decimal places (ex. height, weight, temp)
Discrete variable
can only take on whole numbers (ex. number of students in the classroom)
Nominal variable
categories do not have an order
Ordinal
categories have an order (ex. lifespan)
Sample size
effects the shape of distribution
can lead to misleading patterns or conclusions if it is too small
can increase uncertainty if too small
DAGs
used to visualize the causal assumptions of a hypothesis
communicates your causal assumptions
Cause
refers to a variable or event that produces a change in another variable
Counterfactual
refers to a hypothetical scenario
The fork
Confounder
Can lead to spurious relationships
Can mask real causal relationships between explanatory and response
directed, open
The pipe
Mediator
Can cause post-treatment bias with the total effect
directed, closed
The inverted fork
Collider
directed, closed
Total effect
indirect effects + direct effects
Observational Study
sensitive to confounding variables
inferring causation requires formal causal inference approaches for statistical analysis
Experimental Study
randomization breaks association between treatments and other variables
causal inference approaches can be necessary to avoid post-treatment bias
Randomization
The defining feature of an experiment
Populations
entire group of individuals of units you are interested in studying
Samples
a subset of the population that is observed or measured
Parameter
Quantities with unknown values
Statistic
a numerical summary calculated from a sample
estimate
the value of a statistic used as an approximation of a parameter
estimand
parameter of interest
Sampling distribution
distribution of possible outcomes of an estimate based on our sampling process
Precision
measure of how consistent samples should be when we repeatedly sample from a population
describes sampling error
Accuracy
describes bias
How to maximize precision
replication
How to maximize accuracy
doing random sampling
Random trial
each test is a random process
outcome
end results of each trial
sample space
set of all possible outcomes
Frequentist
probability is the proportional of trials n where we observe the event of interest, X
Bayesian
Probability is as a strength of beleif
Kolmogorov’s Axioms of probability
Rule 1: the probability if any event X is non-negative
P(X) > 0
Rule 2: the probability of any possible outcome in the same sample space is certain (1)
P(Ω)=1
Rule 3: Addition rule
P(A or B)=P(A)+P(B)
Joint probablity
probability of the first and second event
Marginal probability
to get the total probability
Independent events
two events are independent if the occurrence does not affect the probability of the other
(A and B)=P(A)*P(B)
Mutually exclusive
Example: an individual cannot test positive and negative for the infection at the same time
Conditional probability
defined as the probability of event A given that know event B is true
P(A|B)= P(A and B)/P(B)
Random variable
where the term random implies an element of chance in terms of how we observe the variable
Probability distribution
probability of observing each mutually exclusive outcome
Probability density
how probability is distributed across the possible value of a continuous random variable
Probability mass
can be quantified as area under the probability density function for any interval of interest
Empirical Rule
68% of observations are within 1 standard deviation of the mean
95% of the observations are within 2 standards deviation of the mean
Parameter estimation
assumes a single trait true parameter value
point estimates
represents the single best estimate of the parameter of interest
sampling distribution
the probability used to describe a sample estimate
an illustration of uncertainty about the estimates taken from samples
The mean in a sampling distribution
the true parameter value
Standard error
standard deviation of the sampling distribution
Law of large numbers
as the sample size N increases, the point estimate ultimately converges on the true parameter value
Central Limits Theorem
the distribution of a sample estimate will be approximately normal at large sample sizes regardless of the shape of the probability distribution for the underlying probability distribution
Confidence Interval
Use info from the standard error to quantity at a range of possible values for the true proportion parameter at a given level of confidence
Steps of null hypothesis significance testing
1) Specify the null and alternative hypothesis
2) Determine the test statistic and significance value
3) Compute the sampling distribution for the null hypothesis
4) Compute p-value
5) Make a decision
Null hypothesis
Ho, represents hypothesis with “no effect”
Alternative hypothesis
Ha, represents the opposite of the null hypothesis
Null distribution
sampling distribution for the null hypothesis
p-value
probability of getting an estimate as or more extreme than our sample estimate, assuming the null is true
Type I error
We reject a null hypothesis that is actually true
Type II error
Failing to reject a null hypothesis that false
Factors that affect Type II errors
low sample size and high variability
effect size is small
low significance value
One-tailed test
the alternative hypothesis is directional
significance value (α)
refers to the probability of observations in the tails
Two-tailed test
does not specify directionality of a potential effect
prior probability
P(Hypothesis), quantitative statement of your degree of belief about the hypothesis prior to collecting new data
Likelihood
P(Data|Hypothesis), probability of new data you observe given that the hypothesis is true
Marginal Likelihood
P(Data), overall probability of the data integrated across all possible hypotheses
Posterior Probability
P(Hypothesis|Data), our updated probability given the new data
Point estimates
represent the single best estimate of the parameters of interest
Steps of bayesian
1) Specify the prior distribution
2) Quantify the likelihood'
3) Quantify the marginal likelihood'
4) Quantify the posterior distribution
Binomial distribution
probability distribution for a binary variable where the outcome of that binary variable is examined across n trials
Statistical models
quantitative representations of our scientific models
consist of an equation, or a set of equations, that describe single variables, and more often in causal inference, the relationship between variables.