Ch. 10-12 Terms
Observational study
A study based on data where there is no manipulating factors
Retrospective study
Observational study where subjects are selected and their previous conditions/behaviors are determined
Not based on random samples and usually focus on estimating differences between groups/associations between variables
Prospective study
Observational study where subjects are followed to observe future outcomes
No treatments are deliberately applied, so not an experiment
Focus on estimating differences among groups that might appear as the groups are followed during the course of the study
Experiment
Manipulates factor levels to create treatments, RANDOMLY ASSIGNS subjects to these treatment levels, then compares responses of subject groups across treatment levels
Random assignment
To be valid, an experiment must assign experimental units to treatment groups at random
Factor
A variable whose vales are manipulated by the experimenter
Experiments attempts to discover the effects of different factor levels on experimental units
Response variable
A variable whose values are compared across different treatments
In a randomized experiment, large response differences can be attributed to the effect of differences in treatment level
Experimental units
Individuals on whom an experiment is performed
Usually SUBJECTS or PARTICIPANTS
Level
The specific values the experimenter chooses for factors
Treatment
The process/intervention/other controlled circumstance applied to randomly assigned experimental units
Treatments are different levels of a single factor or combinations of levels of two or more factors
Principles of experimental design
CONTROL aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied
RANDOMIZE subjects to treatments to even out effects we cannot control
REPLICATE over as many subjects as possible. Try to replicate with different parts of a population
BLOCK to reduce the effects of identifiable attributes of the subjects that may affect their responses but cannot be controlled
Completely randomized design
All experimental units should have an equal chance of receiving any treatment
Statistically significant
When an observed difference is too large for us to think it could be caused by chance, it is statistically significant
Control group
The experimental unit assigned to a baseline treatment level (either default or nothing)
The response of this will provide a basis Ifor comparison
Blinding
Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups
Single blind, double blind
Two types of individuals who affect outcome of experiment. Those who can influence results (subjects, technicians), those who evaluate the results (judges, physicians)
When everyone in both classes are blinded, it's double blind
When someone in only one class is blind. It's single blind
Placebo
Treatment known to have no effect, administered to one group so all groups experience same conditions
Only by comparing with a placebo can we be sure the observed effect of a treatment is not due simply to placebo effect
Blocking
When subgroups of experimental units differ in ways that may affect their responses to treatments, we isolate them by blocking
We can isolate the variability attributable to differences between blocks so we can see differences caused by treatment more clearly
Randomized block design
Subjects are randomly assigned to treatments only within blocks
Matching
In retrospective or prospective study, subjects who are similar in ways not under study may be matched and compared on variables of interest
Matching (like blocking) reduces unwanted variation
Confounding
When the levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated
Population
Entire group of individuals about whom we wish to learn
Sample
Representative subset of population, examined in hopes of learning from population
Sample survey
Study that asks questions of a sample drawn from some population in hopes of learning something about whole population
Polls taken to assess voter preference are common sample surveys
Bias
Any systematic failure of a sampling method to represent its population
Biased sampling methods over/underestimate parameters
Near impossible to recover from bias
Common examples
Relying on voluntary response
Undercoverage of population
Nonresponse bias
Response bias
Randomization
Each individual is given a fair/random chance of selection
Best defense against bias
Sample size
Number of individuals in a sample
Determines how well sample represents the population, not fraction of population
Census
Sample that is the whole population
Population parameter
Numerically valued attribute of a model for a population
Never really know true value of it, but we can estimate
Statistic/sample statistic
Statistics are values calculated for sampled data
Representative
Sample is representative if the statistics computed from it accurately reflect the corresponding population parameters
Simple random sample (SRS)
Simple random sample of sample size n is a sample in which each set of n elements in the population has an equal chance of being selected
Sampling frame
A list of individuals from whom the sample is drawn
Sampling variability
The natural tendency of randomly drawn samples to differ from one another
Sometimes called sampling error, but not really an error
Stratified random sample
Sampling design where population is divided into subpopulations (strata). Individuals are drawn from each stratum (usually in representative proportion) to reduce variability
Cluster sample
Groups (clusters) are chosen at random to be sampled
Done as a matter of convenience, practicality, or cost
Multistage sample
Sampling designs that combine several sampling methods
Systematic sample
Sample drawn by selecting individuals systematically from a sampling frame
If no relationship between order of sampling frame and variables of interest, can be representative
Pilot survey
Small trial run of survey to check if questions are good
Reduces error caused by ambiguous questions
Voluntary response bias
Bias introduced to sample when individuals can choose on their own whether to participate in the sample
Always invalid
Convenience sample
Sample that consists of individuals who are conveniently available
Not representative of population
Undercoverage
Sampling design that biases sample because it gives some part of the population less representation than it actually has in the population
Nonresponse bias
Bias introduced when large fraction of those sampled don't respond
Those who do respond then are not likely to represent full population
Voluntary response bias is a form of this
Response bias
Anything in a survey design that influences responses
Typically from wording of questions
Random
Outcome is random if we know the possible values it can have, but not which particular value it takes. A random outcome is FREE of human influence
Generating random numbers
Random numbers are hard to generate, but several internet sites offer an unlimited supply of equally likely random values
Simulation
A simulation models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variable on interest
Trial
The sequence of several components representing events that we are pretending will take place
Simulation component
A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely
Response variable
Values of the response variable record the results of each trial with respect to what we were interested in
Observational study
A study based on data where there is no manipulating factors
Retrospective study
Observational study where subjects are selected and their previous conditions/behaviors are determined
Not based on random samples and usually focus on estimating differences between groups/associations between variables
Prospective study
Observational study where subjects are followed to observe future outcomes
No treatments are deliberately applied, so not an experiment
Focus on estimating differences among groups that might appear as the groups are followed during the course of the study
Experiment
Manipulates factor levels to create treatments, RANDOMLY ASSIGNS subjects to these treatment levels, then compares responses of subject groups across treatment levels
Random assignment
To be valid, an experiment must assign experimental units to treatment groups at random
Factor
A variable whose vales are manipulated by the experimenter
Experiments attempts to discover the effects of different factor levels on experimental units
Response variable
A variable whose values are compared across different treatments
In a randomized experiment, large response differences can be attributed to the effect of differences in treatment level
Experimental units
Individuals on whom an experiment is performed
Usually SUBJECTS or PARTICIPANTS
Level
The specific values the experimenter chooses for factors
Treatment
The process/intervention/other controlled circumstance applied to randomly assigned experimental units
Treatments are different levels of a single factor or combinations of levels of two or more factors
Principles of experimental design
CONTROL aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied
RANDOMIZE subjects to treatments to even out effects we cannot control
REPLICATE over as many subjects as possible. Try to replicate with different parts of a population
BLOCK to reduce the effects of identifiable attributes of the subjects that may affect their responses but cannot be controlled
Completely randomized design
All experimental units should have an equal chance of receiving any treatment
Statistically significant
When an observed difference is too large for us to think it could be caused by chance, it is statistically significant
Control group
The experimental unit assigned to a baseline treatment level (either default or nothing)
The response of this will provide a basis Ifor comparison
Blinding
Any individual associated with an experiment who is not aware of how subjects have been allocated to treatment groups
Single blind, double blind
Two types of individuals who affect outcome of experiment. Those who can influence results (subjects, technicians), those who evaluate the results (judges, physicians)
When everyone in both classes are blinded, it's double blind
When someone in only one class is blind. It's single blind
Placebo
Treatment known to have no effect, administered to one group so all groups experience same conditions
Only by comparing with a placebo can we be sure the observed effect of a treatment is not due simply to placebo effect
Blocking
When subgroups of experimental units differ in ways that may affect their responses to treatments, we isolate them by blocking
We can isolate the variability attributable to differences between blocks so we can see differences caused by treatment more clearly
Randomized block design
Subjects are randomly assigned to treatments only within blocks
Matching
In retrospective or prospective study, subjects who are similar in ways not under study may be matched and compared on variables of interest
Matching (like blocking) reduces unwanted variation
Confounding
When the levels of one factor are associated with the levels of another factor in such a way that their effects cannot be separated
Population
Entire group of individuals about whom we wish to learn
Sample
Representative subset of population, examined in hopes of learning from population
Sample survey
Study that asks questions of a sample drawn from some population in hopes of learning something about whole population
Polls taken to assess voter preference are common sample surveys
Bias
Any systematic failure of a sampling method to represent its population
Biased sampling methods over/underestimate parameters
Near impossible to recover from bias
Common examples
Relying on voluntary response
Undercoverage of population
Nonresponse bias
Response bias
Randomization
Each individual is given a fair/random chance of selection
Best defense against bias
Sample size
Number of individuals in a sample
Determines how well sample represents the population, not fraction of population
Census
Sample that is the whole population
Population parameter
Numerically valued attribute of a model for a population
Never really know true value of it, but we can estimate
Statistic/sample statistic
Statistics are values calculated for sampled data
Representative
Sample is representative if the statistics computed from it accurately reflect the corresponding population parameters
Simple random sample (SRS)
Simple random sample of sample size n is a sample in which each set of n elements in the population has an equal chance of being selected
Sampling frame
A list of individuals from whom the sample is drawn
Sampling variability
The natural tendency of randomly drawn samples to differ from one another
Sometimes called sampling error, but not really an error
Stratified random sample
Sampling design where population is divided into subpopulations (strata). Individuals are drawn from each stratum (usually in representative proportion) to reduce variability
Cluster sample
Groups (clusters) are chosen at random to be sampled
Done as a matter of convenience, practicality, or cost
Multistage sample
Sampling designs that combine several sampling methods
Systematic sample
Sample drawn by selecting individuals systematically from a sampling frame
If no relationship between order of sampling frame and variables of interest, can be representative
Pilot survey
Small trial run of survey to check if questions are good
Reduces error caused by ambiguous questions
Voluntary response bias
Bias introduced to sample when individuals can choose on their own whether to participate in the sample
Always invalid
Convenience sample
Sample that consists of individuals who are conveniently available
Not representative of population
Undercoverage
Sampling design that biases sample because it gives some part of the population less representation than it actually has in the population
Nonresponse bias
Bias introduced when large fraction of those sampled don't respond
Those who do respond then are not likely to represent full population
Voluntary response bias is a form of this
Response bias
Anything in a survey design that influences responses
Typically from wording of questions
Random
Outcome is random if we know the possible values it can have, but not which particular value it takes. A random outcome is FREE of human influence
Generating random numbers
Random numbers are hard to generate, but several internet sites offer an unlimited supply of equally likely random values
Simulation
A simulation models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variable on interest
Trial
The sequence of several components representing events that we are pretending will take place
Simulation component
A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely
Response variable
Values of the response variable record the results of each trial with respect to what we were interested in