1/87
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Pairwise comparisons of more than two means
the design controls the variation within the groups while measuring the variation between them
Observational studies
- a statistical study is observational when it is conducted using pre existing data and or collected without any particular design
- using data from product registrations and warranty cards
- observing consumer purchases in a store
Time elements of studies
- retrospective
- prospective
Retrospective
- studies an outcome in the present by examining historical records
- using last year's sales records to predict this year's
Prospective
- identifies subjects in advance and collects data as events unfold
- consumer expenditure survey
Experiment
recording the outcomes from the manipulation of attributes in the study. the act of measuring or collecting data on a subject
Factors
the variables being manipulated by being set to particular values called levels
Treatment
the combination of factor levels assigned to a subject
Response variable
the variable being measured or recorded
Principles of design
- control
- random assignment
- replication
- blocking
Control
- control the sources of variation other than the factors being manipulated by making conditions as similar as possible for all treatment groups
- a control treatment is a special treatment class designed to mark the baseline of the study, and the group that receives it is called the control group
Random assignment
random assignment of subjects to the different groups
Replication
- Repeated observations at each treatment are called replicates. If the number of replicates is the same for each treatment combination, the experiment is said to be balanced
- A second kind of replication is to repeat an entire experiment for a different group of subjects, under different circumstances, or at a different time
Blocking
- Group or block subjects together according to some that is uncontrollable but may affect the response. Such factors are called blocking factors, and their levels are called blocks.
- Examples of blocking factors: sex, ethnicity, or marital status.
- In effect, blocking an experiment into n blocks is equivalent to running n separate experiments
How the design works: volume increases by
- selecting factors
- choosing treatments
- determining sample size
How the design works: noise is reduced by
assigning treatments to the experiential units
Blinding
the deliberate withholding of the treatment details from individuals who might affect the outcome
Two sources of unwanted bias
- those who might influence the results
- those who evaluate the results
Single blind experiment
one or the other groups is blinded
Double blind experiment
both groups are blinded
Applying ___ treatment can alter a response
any
To separate the real treatment effects from imagined ones, use an ineffective control treatment that mimics the treatments being tested. such a fake treatment is called a
placebo
3 types of designs
- completely randomized has one factor with multiple levels
- randomized block has two factors, but the experiment is not replicated
- factorial design has two factors with replication
One factor anova experiment
- factor
- levels
- response variable
- control group
When each of the possible treatments is assigned to at least one subject at random, the design is called
completely randomized design
In the simplest completely randomized design, the subjects are assigned
at random to the treatments
Confounded
- when the levels of one factor are associated with the levels of another factor
- when levels incorporate more than one factor
Lurking variables
- drive two other variables in such a way that a causal relationship is suggested between the two
You can apply anova to observational data if
the box plots show roughly equal spreads and symmetric, outlier free distributions
Apply anova with caution, as these studies are prone to the following problems
- observational studies are frequently unbalanced
- randomization is usually absent
- there is no control over lurking variables or confounding
- don't draw causal conclusions even when the F Statistic is significant
Multiple comparisons
- knowing that the means differ leads to the question of which ones are different and by how much
- methods that test these issues are called methods for multiple comparisons
Why don't we simply use a t-test for differences between means to test each pair of group means?
Each t-test is subject to a Type I error, and the chances of committing such an error increase as the number of tested pairs increase
Blocking factors require
randomizing the subject to the treatments within each block
Randomized block design
- the response variable (Y) is the mean time to complete the course
- the main treatment (X) is the beverage
- the block is the subject (driver)
Full factorial design
contains treatments that represent all possible combinations of factors at different levels
2 factors of full factorial design
- test for interaction first, as the effect of changing the level of one factor depends upon the level of the other factor
- unless an experiment incorporates a factorial design, you cannot see interactions, and this may be a serious omission
Test for interaction
H0: there is no interaction
Ha: there is interaction
What can go wrong?
- don't give up just because you can't run an experiment
- beware of confounding and lurking variables
- bad things can happen to well designed experiments. record as much info as possible regarding the circumstances of an experiment
- don't spend your entire budget on your first run. try a small pilot experiment first
- watch out for outliers
Pitfalls
- watch out for changing variances
- be wary of drawing causality conclusions from observational studies
- be wary of generalizing to situations other than the one at hand
- watch for multiple comparisons and use an appropriate method, such as the turkey method
- be sure to fit an interaction term when it exists. if the interaction term is not significant, fit a simpler block design to test the main effects instead
Randomized block design
also known as two-factors without replication
A block is a
- secondary factor that may also affect the response variable
- many times, the blocking factor may be categorical in nature (gender, job title, department, or area of location)
- other times, it may be ordinal (low, medium, high)
Blocking requires
the we randomly assign the subjects to the treatments within each block
Different letters in the connecting letters report mean
significantly different means
If there are two of the same letter for a variable
the means are NOT significantly different
Summary
- Recognize observational studies and randomized comparative experiments
- Only well-designed experiments can allow us to reach causal conclusions
- Observe the Principles of Experimental Design
- Use the p-values under Post Hoc Analysis or the Tukey Method for identifying which treatments are different.
- Use the Pairwise Comparison Scatterplot if Interaction is found
Observational data
typically quick, easy, and inexpensive to gather
Drawbacks of observational data
- cannot create a control group
- cannot balance the sample sizes
- cannot prevent other factors from influencing the outcome when we simply observe things
The fact that a lot of data in business comes to us second-hand and not from primary sources can introduce
measurement issues
Example of primary observational data
recording breakfast cereal sales in a store in conjunction with coupons offered or the location in the aisles
Example of secondary observational data
gathered from the internet comparing general mils, Kelloggs, and Quaker Oats sales
Retrospective
observational data coming to us from past studies
Prospective
actively obtaining data as the subjects choose items
ANOVA
- response variable mist be quantitative
- makes pairs comparisons of more than 2 means
- when comparing population means of a numeric variable for 3 or more groups based on sample data, ANOVA is the most appropriate procedure to use
Factors
attributes that contribute to the outcome
Treatment
combination of all factors and their levels
2 main differences between an experiment and other forms of investigation
- the deliberate manipulation of factors to create treatment effects
- random assignment of those treatments to the subjects
4 basic elements of experiments
- random assignment of the subjects and treatments
- control is an essential element to any true experiment
- replication or repeated samples of the treatments within each group is necessary
- blocking
Blocking
involves grouping the subjects by a secondary factor
Effective error rate
- the chance of a Type I Error if ANOVA is not used to compare more than two means. It accounts for the number of pairs of means compared if using two-sample hypothesis tests.
- basically for ANOVA, spreading the significance level of 0.05 across all possible comparisons will reduce the effective error rate that comes with sing 0.05 significance on separate tests
Hypothesis for one-factor anova completely randomized design
H0: m1 = m2 = m3
Ha: at least 2 means differ
What is being compared for the hypothesis for a one-factor anova completely randomized design?
variance within the samples with the variance between the sample means
BASICALLY, if the variance between sample means is LARGER than the variance within the samples, REJECT the null hypothesis because at least 2 means differ
Connecting letters report
levels not connected by the same level are significantly different
Ordered differences report
protects you from inflating your effective error rate
Ordered differences report p-value significance
low p-value = there is a difference in population means for those two levels
Two factor anova randomized block design
- checking for differences in one of the factors (treatments) while controlling the other factor (blocks)
- 2 factor ANOVA design without replication
Two factor anova factorial design with replication
factorial treatment with interaction
Factorial treatment
- 2 or more factors having 2 or more levels
- this structure will allow you to see if there is interaction between the factors
Interaction between 2 factors indicates
- that the effect of changing the level of one factor depends upon the level of the other factor
- if there is interaction, you cannot interpret individual p-values because nothing is separate
What is Analysis of Variance (ANOVA) used for?
To compare more than two means.
How does ANOVA control the variables?
It controls the variation within the groups while measuring the variation between the groups.
What is observational data how is it identified in time?
We get observational data by simply watching what respondents do or by obtaining it from secondary sources. Retrospective data is from the past (say a year ago.) Prospective data is in real-time.
What two elements MUST an experiment have?
Random assignment of the subjects to the treatments and some form of control. It may also include Blocking and Replication.
Name 3 types of Designs.
Completely Randomized has just one factorRandomized block has two factors but w/out replication, andFactorial has two factors with Replication. Only Factorial designs can test for interaction between the two variables.
Know how the proper hypothesis is set up for each design.
Completely Randomized + Randomized BlockHo: µ1=µ2=µ3...Ha: At least one mean differs from assumed (Used with Completely Randomized or Randomized Block)Factorial DesignHo: There is No InteractionHa: There is Interaction (Use as the first test in a Factorial design).
What is a Response Variable?
The value being measured or recorded. It's a quantitative value.
What are Treatments?
All combinations of factors at different levels.
What is Volume, and how is it increased?
Information - choosing factors, treatments and sample sizes.
What is Noise, and how is it decreased?
Error - randomly assigning subjects
What is a balanced design?
Equal sample sizes in each group. This is preferred.
Define replicating the experiment.
Either having replicates within each group, or completely redoing the experiment at another time and place.
What is Blinding?
Withholding what type of treatment was given. Single blind means the subjects did not know what type of treatment they received. Double blind means the subjects and the analysts don't know. This keeps out bias.
What does a control group offer?
A baseline for the data.
Why is it important to use ANOVA to compare more than two means?
To keep the risk of a Type I Error = alpha; otherwise, the Effective Error Rate increases exponentially with the number of PAIRS that are mis-compared.
What are confounded variables?
When two factors have been conjoined and cannot be separated. E.g., Store & Product sales.
What is a Lurking variable?
A factor that was not included in the experiment but is driving the ones that are.
What is interaction and know how to read the interaction plot?
When two variables work in combination to yield different levels of y. See Factorial examples in Ch.9.
Know why observational data is not preferred.
- It is often unbalanced- lacks random assignment of the subjects to the treatments,- does not control for confounded or lurking variables,- cannot show causality.
If there is no interaction in a factorial design, what is the next step?
Look at the p-values for the individual factors and if they are < than alpha, then look at their connecting letters report to decide which are different.