1/61
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
statistics
study of variation (how data varies allowing us to draw reliable to draw reliable conclusions with that data)
individual
an object described in a set of data (people, animals, things)
variable
characteristic that can take different values for different individuals
categorical variable
takes values that re labels, which place each individual into a group called a category
quantitive variable
number values that are quantities (counts or measurements)
discrete variables
numeric/quantitive variables that have a countable number of values between any two values (whole numbers)
continuous variables
numeric/quantitive variables that have an infinite number of values between any two values
distribution
tells us what values the variable takes and how often it takes each variable
frequency table
shows number of individuals having each value
relative frequency table
shows the proportion or percentage of individuals having each value
population (of interest)
entire group of individuals we want information about
census
collects data from every individual in the population
sample
a subset of individuals in the population from which we collect data
sample survey
a study that collects data from a sample to learn about the population from which the sample was selected
random sampling
involves using chance process to determine which members of a population are chosen for the sample
statistic
a number that describes some characteristic of a sample
parameter
a number that describes some characteristic of the population
mean
average
parameter: mu=…
statistic: x (bar over)=…
standard deviation
measure of spread
parameter: lowercase sigma=
statistic: sx (x sub exponent)=…
proportion
percent
parameter: p=…
statistic: p (w/ hat ^ over)=…
statistical inference
using information from a sample to infer, or draw conclusions
sampling frame
list of the items or people that have a chance to be chosen for the sample (should match the population of interest, or else problems can happen) can generalize results to the population sampled from
observational study
observes individuals and measures variables of interest, but does not attempt to influence the responses (cannot prove causation from this study)
experiment
deliberately imposes treatments on experimental units to measure their responses (can prove causation)
simple random sample (SRS)
a sample chosen in such a way that every group of n individuals in the population has an equal chance to be selected as the sample (every combination is possible in the end)-selected by grouping all names together and using random process to pick n names, you can never group them prior
sampling without replacement
an individual from a pop can only be selected once
sampling with replacement
an individual from a pop can be selected more than once
(3) Ways to take and SRS
random number table
calculator/random number integer
slips in a hat
*review processes in notes
sampling variability (or sampling error)
the fact that statistical measures taken from good random samples will yield different statistics from sample to sample
bias
procedure generates samples (often from bad sampling techniques) that yield statistical measures that consistently underestimate or consistently overestimate the parameter of interest
unbiased estimator
if the mean or average of a statistic (when repeated with many samples) is equal to the true population parameter
strata
groups of individuals in a n population that share characteristics thought to be associated with the variables being measured in a study (singular form is stratum)
stratified random sample
sample selected by choosing an SRS from each stratum and combining the SRSs into one overall sample-when done right, will produce statistics with less sampling variability (closer to the true parameter)
cluster
a group of individuals in the population that are located near each other (heterogenous mix of people)
cluster sample
a sample selected by randomly choosing clusters and including each member of the selected clusters in the sample
systematic random sample
sample selected from an ordered arrangement of the population by randomly selecting one of the first k individual and choosing every kth individual thereafter (no statistical advantage, just easier to obtain)
convenience sample
consists of individuals from the population who are easy to reach-can lead to bias
voluntary response sample
consists of people who choose to be in the sample by responding to a general invitation, sometimes called self-selected sample (can result in voluntary response bias)
under coverage
occurs when some members of the population are less likely to be chosen or cannot be chosen in a sample (can result in under coverage bias)
nonresponse
occurs when an individual chosen for the sample can’t be contacted or refuses to participate (can result in nonresponse bias)
response bias
occurs when there is a constant pattern of inaccurate responses to a survey (influenced by social pressures, wording of a question, etc.)
response variable
measures an outcome of a study
explanatory variable
help explain or predict changes in a response variable
confounding
occurs when two variable are associated in such a way that their effect on a response variable cannot be distinguished from each other (3 variables at play)
treatment
specific condition applied to the individuals in an experiment (if multiple explanatory variables, a treatment is combo of specific values of these variables)
experimental units
object/subjects to which a treatment is randomly assigned
placebo
a treatment that has no active ingredient but is otherwise like other treatments
factor
explanatory variable that is manipulated and may cause a change in the response variable (different values are called levels)
comparison/control group
used to provide a baseline for comparing the effects of other treatments, depending on the purpose of the experiment, control group may be given an inactive treatment(placebo), active treatment, or no treatment at all
-helps determine if it was in fact the explanatory variable that influenced the response variable
control
keeping other variables constant for all experimental units
-helps reduce variability in the response variable as much as possible and therefore will allow you to better determine if the explanatory variable is causing a difference in the response variable
placebo effect
describes the fact that some subjects in an experiment will respond favorably to any treatment, even an inactive treatment
double-blind
neither the subjects nor those who interact with them and measure the response variable know which treatment a subject is receiving
single-blind
either the subject or the people who interact with them and measure the response variable don’t know which treatment a subject is receiving
random assignment (randomization)
treatments are assigned to experimental units (or vice versa) using a chance process
-create groups that are as equal as possible across the many variables that you have no control over+any observed differences in responses after treatment can be attributed to explanatory variable
replication
giving each treatment to enough experimental units so that a difference in the effects of the treatments can be distinguished from chance variation due to the random assignment
-ensure random assignment does balance out the groups
4 principles of a well-designed experiment
comparison
randomization
replication
control
completely randomized design (of experiment)
the experimental units are assigned to the treatments completely at random
block
a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments
randomized block design
the random assignment of experimental units to treatments is carried out separately within each block (separation of treatments happens in “mini experiment” within each block
matched pairs design
comparing two treatments that uses block of size 2, with 2 very similar experimental units being paired and the two treatments are randomly assigned within each pair (in others each experimental unit receives both treatments in a random order)
scope of inference
-random selection of individuals justifies inference about the population from which the individual were chosen
-random assignment of individuals to groups in an experiment with statistically significant results justifies inference bat cause and effect
statistically significant
observed difference in responses between the groups in an experiment is so large that it is unlikely to be explained by chance variation in the random assignment, the results are called statistically significant (ex:less than 5%)