Population
represents the entire body of individuals that one wants the information about
Sample
a group selected to display and represent the desired population and where the data is ultimately collected.
Bias
In statistical sampling, bias is the tendency to use a particular sampling method that supports and favors certain outcomes and results than others.
Convenience sampling (and it’s problems)
Convenience sampling relies on the ease of attainability to reach and sample individuals, often leading to bias
The main issue with utilizing convenience sampling is that it creates unrepresentative data.
Voluntary responses (and it’s problems)
Individuals choose if they would like to participate in a sampling, creating an attraction of people who feel strongly about certain issues in that poll and have almost identical opinions/responses, further leading to bias.
Voluntary responses, like convenience sampling and responses, ultimately result in bias.
Simple random sample (SRS) + steps in constructing an SRS
a sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample.
steps in constructing this type of sample include labeling each individual in the population with a distinct numerical label, from 1 to N, and using a random number generator to acquire n differing integers from the values 1 to N. Lastly, one must select the individuals that correspond with the certain corresponding integer.
Sampling with replacement vs sampling without replacement
With replacement: Sampling with replacement means the selected integer can be selected again in the random number generator
Without: sampling without replacement means a number cannot be selected again.
For constructing an SRS, always choose sampling without replacement.
Stratified random sample + when it’s a preferred method over a simple random sample
A stratified random sample is a sample obtained by classifying the population into groups based on similar individuals (strata) then choosing a separate SRS in each stratum (multiple strata) and combining these SRSs to form the sample.
It is a preferred method when there are large differences between each strata.
Cluster sample + when it’s a preferred method over a simple random sample
A cluster sample is a sample obtained by classifying the population into groups of individuals that are located close to each other (clusters) and then choosing an SRS from the clusters. All individuals in all the clusters are included in the sample.
It's a preferred method when the clusters are similar to the population on a smaller scale.
Undercoverage and why it’s a problem
Undercoverage is when some individuals of the population are less likely to be chosen or are not chosen in a sample.
This creates a problem because the results from surveys/polls not everyone has a chance at an equal response, causing an increase in error in the data set (ex: mean of data set increases).
Non-response vs voluntary response
Non-response is when an individual chosen for the sample can’t be contacted/reached or even refuses to participate. Non-response is different from voluntary response because individuals in a voluntary response have chosen to be in the sample (no nonresponse), whereas non-response can only take place after a sample has been chosen.
(END OF 4.1) Factors that could lead to bias
age
ethnicity
gender
personality/characteristics of the interviewer
(START OF 4.2 AND 4.3) Observational study vs experiment
observational study: looks at variables of interest to individuals but does not attempt to alter or influence the responses from the study.
experiment: when researchers purposely manipulate or impose certain conditions/treatments on individuals in order to measure their responses.
explanatory variable vs response variable
explanatory variable: variable that helps further explain or speculate about changes in a response variable
response variable: variable that measures an outcome of a study.
Treatment
certain conditions applied to the individuals of an experiment
Experimental units
the object in which a treatment is randomly assigned to
Subjects
experimental units that are human beings
Factors
explanatory variables in experiments
Levels
specified value assigned to an explanatory variable (factor) in an experiment
Confounding + how it leads to incorrect conclusions about a response
when two variables are correlated in such a way that their effects on a response variable cannot be differentiated from each other.
This can lead to incorrect conclusions about a response by demonstrating an incorrect correlation between the data.
Placebo effect
expresses how some subjects respond favorably to any treatment, even an inactive one (placebo).
Single-blind experiment
when either the subjects or the individuals who interact with them measure the response variable where they never both know what treatment a subject received.
Double-blind experiment
an experiment in which neither the subjects nor those who interact with them measure to be informed on which treatment a subject has received.
Purpose of randomizing
Randomizing allows for a more equal experimental field, helping to create more balanced and equivalent groups.
? Why is it important to control variables in an experiment ?
It’s important to control variables because you want to reduce the variability in the response variable. It also helps to maintain constant variables for all experimental units.
? How could you choose 3 individuals (without replacement) from a group of 10 ?
You could choose 3 individuals by utilizing a random block design.
Using replication in an experiment
Using replication means an adequate number of experimental units (subjects) to find a difference in the effects of the treatments from chance variation due to the random assignment.
(Replication equals repeatability)
Blocking an experiment + it’s purpose
Blocking an experiment means knowing a group of experimental units before the experiment is conducted in such a way so that it’s expected to have an effect on the response to the treatments.
purpose: to control for multiple variables in an experiment
Blocking vs stratifying
Blocking: when the organization of experimental units in groups (blocks) are organized similar to one another.
Stratifying: when you organize individuals in a population into a strata.
Two types of matched pairs designs
One type of matched pair designs is by random order and subjects receive both treatments.
The second type is when the subject is paired with another subject, and they are also randomly assigned to one of the treatments.
Sampling variability + how it’s related to sample size
Sampling variability is when different random samples (and of the same size) from the same population produce differing estimates/results.
? What does it mean if the results of a study or experiment are statistically significant ?
If a study or experiment is statistically significant that means that the results from the study are too unusual to be explained primarily by chance.
? What is necessary in order to make an inference about a population from a sample ?
To make an inference about a population, samples must be selected at random.
? What is the best strategy for making an inference about cause and effect ?
The best strategy for making an inference about cause and effect is to use a random assignment of individuals.
If the strategy for making an inference about cause and effect is infeasible or impossible, what are the five criteria for establishing cause and effect from observational studies?
The five criteria are:
the association is strong;
the association is consistent;
larger values of the explanatory variable are associated with stronger response;
the alleged cause precedes the effect in time;
and the alleged cause is plausible.