Chapter 9: Analysis of Variance

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/87

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

88 Terms

New cards

Pairwise comparisons of more than two means

the design controls the variation within the groups while measuring the variation between them

New cards

Observational studies

- a statistical study is observational when it is conducted using pre existing data and or collected without any particular design
- using data from product registrations and warranty cards
- observing consumer purchases in a store

New cards

Time elements of studies

- retrospective
- prospective

New cards

Retrospective

- studies an outcome in the present by examining historical records
- using last year's sales records to predict this year's

New cards

Prospective

- identifies subjects in advance and collects data as events unfold
- consumer expenditure survey

New cards

Experiment

recording the outcomes from the manipulation of attributes in the study. the act of measuring or collecting data on a subject

New cards

Factors

the variables being manipulated by being set to particular values called levels

New cards

Treatment

the combination of factor levels assigned to a subject

New cards

Response variable

the variable being measured or recorded

New cards

Principles of design

- control
- random assignment
- replication
- blocking

New cards

Control

- control the sources of variation other than the factors being manipulated by making conditions as similar as possible for all treatment groups
- a control treatment is a special treatment class designed to mark the baseline of the study, and the group that receives it is called the control group

New cards

Random assignment

random assignment of subjects to the different groups

New cards

Replication

- Repeated observations at each treatment are called replicates. If the number of replicates is the same for each treatment combination, the experiment is said to be balanced
- A second kind of replication is to repeat an entire experiment for a different group of subjects, under different circumstances, or at a different time

New cards

Blocking

- Group or block subjects together according to some that is uncontrollable but may affect the response. Such factors are called blocking factors, and their levels are called blocks.
- Examples of blocking factors: sex, ethnicity, or marital status.
- In effect, blocking an experiment into n blocks is equivalent to running n separate experiments

New cards

How the design works: volume increases by

- selecting factors
- choosing treatments
- determining sample size

New cards

How the design works: noise is reduced by

assigning treatments to the experiential units

New cards

Blinding

the deliberate withholding of the treatment details from individuals who might affect the outcome

New cards

Two sources of unwanted bias

- those who might influence the results
- those who evaluate the results

New cards

Single blind experiment

one or the other groups is blinded

New cards

Double blind experiment

both groups are blinded

New cards

Applying ___ treatment can alter a response

any

New cards

To separate the real treatment effects from imagined ones, use an ineffective control treatment that mimics the treatments being tested. such a fake treatment is called a

placebo

New cards

3 types of designs

- completely randomized has one factor with multiple levels
- randomized block has two factors, but the experiment is not replicated
- factorial design has two factors with replication

New cards

One factor anova experiment

- factor
- levels
- response variable
- control group

New cards

When each of the possible treatments is assigned to at least one subject at random, the design is called

completely randomized design

New cards

In the simplest completely randomized design, the subjects are assigned

at random to the treatments

New cards

Confounded

- when the levels of one factor are associated with the levels of another factor
- when levels incorporate more than one factor

New cards

Lurking variables

- drive two other variables in such a way that a causal relationship is suggested between the two

New cards

You can apply anova to observational data if

the box plots show roughly equal spreads and symmetric, outlier free distributions

New cards

Apply anova with caution, as these studies are prone to the following problems

- observational studies are frequently unbalanced
- randomization is usually absent
- there is no control over lurking variables or confounding
- don't draw causal conclusions even when the F Statistic is significant

New cards

Multiple comparisons

- knowing that the means differ leads to the question of which ones are different and by how much
- methods that test these issues are called methods for multiple comparisons

New cards

Why don't we simply use a t-test for differences between means to test each pair of group means?

Each t-test is subject to a Type I error, and the chances of committing such an error increase as the number of tested pairs increase

New cards

Blocking factors require

randomizing the subject to the treatments within each block

New cards

Randomized block design

- the response variable (Y) is the mean time to complete the course
- the main treatment (X) is the beverage
- the block is the subject (driver)

New cards

Full factorial design

contains treatments that represent all possible combinations of factors at different levels

New cards

2 factors of full factorial design

- test for interaction first, as the effect of changing the level of one factor depends upon the level of the other factor
- unless an experiment incorporates a factorial design, you cannot see interactions, and this may be a serious omission

New cards

Test for interaction

H0: there is no interaction
Ha: there is interaction

New cards

What can go wrong?

- don't give up just because you can't run an experiment
- beware of confounding and lurking variables
- bad things can happen to well designed experiments. record as much info as possible regarding the circumstances of an experiment
- don't spend your entire budget on your first run. try a small pilot experiment first
- watch out for outliers

New cards

Pitfalls

- watch out for changing variances
- be wary of drawing causality conclusions from observational studies
- be wary of generalizing to situations other than the one at hand
- watch for multiple comparisons and use an appropriate method, such as the turkey method
- be sure to fit an interaction term when it exists. if the interaction term is not significant, fit a simpler block design to test the main effects instead

New cards

Randomized block design

also known as two-factors without replication

New cards

A block is a

- secondary factor that may also affect the response variable
- many times, the blocking factor may be categorical in nature (gender, job title, department, or area of location)
- other times, it may be ordinal (low, medium, high)

New cards

Blocking requires

the we randomly assign the subjects to the treatments within each block

New cards

Different letters in the connecting letters report mean

significantly different means

New cards

If there are two of the same letter for a variable

the means are NOT significantly different

New cards

Summary

- Recognize observational studies and randomized comparative experiments
- Only well-designed experiments can allow us to reach causal conclusions
- Observe the Principles of Experimental Design
- Use the p-values under Post Hoc Analysis or the Tukey Method for identifying which treatments are different.
- Use the Pairwise Comparison Scatterplot if Interaction is found

New cards

Observational data

typically quick, easy, and inexpensive to gather

New cards

Drawbacks of observational data

- cannot create a control group
- cannot balance the sample sizes
- cannot prevent other factors from influencing the outcome when we simply observe things

New cards

The fact that a lot of data in business comes to us second-hand and not from primary sources can introduce

measurement issues

New cards

Example of primary observational data

recording breakfast cereal sales in a store in conjunction with coupons offered or the location in the aisles

New cards

Example of secondary observational data

gathered from the internet comparing general mils, Kelloggs, and Quaker Oats sales

New cards

Retrospective

observational data coming to us from past studies

New cards

Prospective

actively obtaining data as the subjects choose items

New cards

ANOVA

- response variable mist be quantitative
- makes pairs comparisons of more than 2 means
- when comparing population means of a numeric variable for 3 or more groups based on sample data, ANOVA is the most appropriate procedure to use

New cards

Factors

attributes that contribute to the outcome

New cards

Treatment

combination of all factors and their levels

New cards

2 main differences between an experiment and other forms of investigation

- the deliberate manipulation of factors to create treatment effects
- random assignment of those treatments to the subjects

New cards

4 basic elements of experiments

- random assignment of the subjects and treatments
- control is an essential element to any true experiment
- replication or repeated samples of the treatments within each group is necessary
- blocking

New cards

Blocking

involves grouping the subjects by a secondary factor

New cards

Effective error rate

- the chance of a Type I Error if ANOVA is not used to compare more than two means. It accounts for the number of pairs of means compared if using two-sample hypothesis tests.
- basically for ANOVA, spreading the significance level of 0.05 across all possible comparisons will reduce the effective error rate that comes with sing 0.05 significance on separate tests

New cards

Hypothesis for one-factor anova completely randomized design

H0: m1 = m2 = m3
Ha: at least 2 means differ

New cards

What is being compared for the hypothesis for a one-factor anova completely randomized design?

variance within the samples with the variance between the sample means

BASICALLY, if the variance between sample means is LARGER than the variance within the samples, REJECT the null hypothesis because at least 2 means differ

New cards

Connecting letters report

levels not connected by the same level are significantly different

New cards

Ordered differences report

protects you from inflating your effective error rate

New cards

Ordered differences report p-value significance

low p-value = there is a difference in population means for those two levels

New cards

Two factor anova randomized block design

- checking for differences in one of the factors (treatments) while controlling the other factor (blocks)
- 2 factor ANOVA design without replication

New cards

Two factor anova factorial design with replication

factorial treatment with interaction

New cards

Factorial treatment

- 2 or more factors having 2 or more levels
- this structure will allow you to see if there is interaction between the factors

New cards

Interaction between 2 factors indicates

- that the effect of changing the level of one factor depends upon the level of the other factor
- if there is interaction, you cannot interpret individual p-values because nothing is separate

New cards

What is Analysis of Variance (ANOVA) used for?

To compare more than two means.

New cards

How does ANOVA control the variables?

It controls the variation within the groups while measuring the variation between the groups.

New cards

What is observational data how is it identified in time?

We get observational data by simply watching what respondents do or by obtaining it from secondary sources. Retrospective data is from the past (say a year ago.) Prospective data is in real-time.

New cards

What two elements MUST an experiment have?

Random assignment of the subjects to the treatments and some form of control. It may also include Blocking and Replication.

New cards

Name 3 types of Designs.

Completely Randomized has just one factorRandomized block has two factors but w/out replication, andFactorial has two factors with Replication. Only Factorial designs can test for interaction between the two variables.

New cards

Know how the proper hypothesis is set up for each design.

Completely Randomized + Randomized BlockHo: µ1=µ2=µ3...Ha: At least one mean differs from assumed (Used with Completely Randomized or Randomized Block)Factorial DesignHo: There is No InteractionHa: There is Interaction (Use as the first test in a Factorial design).

New cards

What is a Response Variable?

The value being measured or recorded. It's a quantitative value.

New cards

What are Treatments?

All combinations of factors at different levels.

New cards

What is Volume, and how is it increased?

Information - choosing factors, treatments and sample sizes.

New cards

What is Noise, and how is it decreased?

Error - randomly assigning subjects

New cards

What is a balanced design?

Equal sample sizes in each group. This is preferred.

New cards

Define replicating the experiment.

Either having replicates within each group, or completely redoing the experiment at another time and place.

New cards

What is Blinding?

Withholding what type of treatment was given. Single blind means the subjects did not know what type of treatment they received. Double blind means the subjects and the analysts don't know. This keeps out bias.

New cards

What does a control group offer?

A baseline for the data.

New cards

Why is it important to use ANOVA to compare more than two means?

To keep the risk of a Type I Error = alpha; otherwise, the Effective Error Rate increases exponentially with the number of PAIRS that are mis-compared.

New cards

What are confounded variables?

When two factors have been conjoined and cannot be separated. E.g., Store & Product sales.

New cards

What is a Lurking variable?

A factor that was not included in the experiment but is driving the ones that are.

New cards

What is interaction and know how to read the interaction plot?

When two variables work in combination to yield different levels of y. See Factorial examples in Ch.9.

New cards

Know why observational data is not preferred.

- It is often unbalanced- lacks random assignment of the subjects to the treatments,- does not control for confounded or lurking variables,- cannot show causality.

New cards

If there is no interaction in a factorial design, what is the next step?

Look at the p-values for the individual factors and if they are < than alpha, then look at their connecting letters report to decide which are different.