AP Statistics Comprehensive Review

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/60

Earn XP

Description and Tags

Contains U1, U2, and U3

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

61 Terms

New cards

What makes a study an Observational Study?

treatments were not randomly assigned
- ex: This is an observational study b/c volunteers were not assigned to whether they watched TV or didn’t watch TV.

New cards

What makes a study an Experiment?

treatments ARE randomly assigned
- ex: This is an experiment because the teacher randomly assigns which students will use the physical program and which students will use the virtual program.

New cards

How do you conduct a Simple Random Sample?

State how you will number the population
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.

New cards

When do you conduct a Stratified Random Sample?

Use stratification when a variable that is inherent to the subjects/units (ie. age, gender) might influence the outcome of the experiment.

New cards

What are the advantages and disadvantages of a Stratified Random Sample?

Advantage:
- reduces variability in the response variable (state the response variable)
- often more representative of the population as each group is proportionally represented in the sample
Disadvantage:
- often time consuming to carry out

New cards

How do you conduct a Stratified Random Sample?

Group individuals into their strata
State how you will number the population (number may change for each strata)
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.

New cards

When do you conduct a Cluster Random Sample?

formed based on convenience instead of the confounding variable

New cards

What are the advantages and disadvantages of a Cluster Random Sample?

Advantages:
- quicker to carry out than stratifying
Disadvantages:
- the sample may not be representative of the population

New cards

How do you conduct a Cluster Random Sample?

Group individuals based off of proximity
1. Ex. If you’re taking a sample of a hotel, group the subjects by their floor
State how you will number the population
State the method used (random number generator)
State the range of numbers used and how many you will select. Also, communicate no repeated numbers allowed.

New cards

When does a study have selection bias?

when no randomness was used to select the sample/ poorly selected

New cards

What are the types of selection bias?

Convenience Sampling
- sampled based on what is convenient for the researcher
  - ex. sampling the first 30 people you see
Voluntary Response Sampling
- the people who respond are often similar to each other with respect to their views/opinions
  - ex. surveys online

New cards

When does a study have nonresponse bias?

there was random sampling
HOWEVER, the people contacted for the study can’t be contacted or refuse to participate
- ex. mail people the survey, but don’t follow up if they don’t mail it back

New cards

When does a study have response bias?

participants were randomly selected AND the people responded
but the responses are quite inaccurate

New cards

How does response bias occur?

Leading questions: people know what answer you want so they tell you that answer
Confusing questions: people don’t understand the question
Awkwardness between the researcher & individual answering

New cards

What is the explanatory variable?

a variable whose levels are manipulated intentionally
- the independent variable
- EXPLAINS the response variable

New cards

What is the response variable?

a variable that is the outcome of a study; what you are measuring
- dependent variable
- is what happens in RESPONSE to the explanatory variable being manipulated

New cards

What is a confounding variable?

a variable that is related to the explanatory variable and possibly influences the response variable

New cards

What are the criteria of a well-designed experiment?

comparison: must compare two or more treatment groups
random assignment: experimental units/subjects must be randomly assigned to treatments
control: control potential confounding variables by keeping all other variables constant for all groups
replication: must have more than one experimental unit/subject in each treatment

New cards

What is the purpose of random assignment?

random assignment creates roughly equivalent groups (provide context)
allows for fair comparison between (provide context about the treatments)
can be attributed to the treatment (context) instead of untested variables

New cards

What is the statistical advantage of blocking by the confounding variable?

blocking separates natural chance variability in responses from the differences due to the confounding variable (state with context)
this makes it easier to determine if one treatment is better/worse than the other treatment (makes more sense with context)

New cards

Why is replication important?

there is more than one unit/subject in each treatment (provide context for both the unit and treatment)
this is important to show that the results aren’t due to random chance

New cards

What is a completely randomized designed experiment?

most basic designs
take a whole group of subjects/units and use a random method to assign them to treatment groups (typically of equal/similar size)

New cards

What is a randomized block designed experiment?

units/subjects are FIRST separated into blocks, and then random assignment to treatments is done within those sections
Reason: separates natural chance variability in responses from differences due to the blocking variable. This makes it easier to determine if one treatment is really more effective than another.

New cards

What is a matched pair design?

Pairing where 2 similar subjects are paired together
- ex. pair golfers with a similar skill level
Pairing where each subject does both treatments

New cards

What can you generalize for an observational study?

can only generalize the findings of a study to the population from which the sample was selected from

New cards

What can you generalize from a well-designed experiment?

results from the experiment can be generalized to others who are similar to the volunteers
results can be generalized to the general population when the subjects/units ARE randomly sampled from the population before randomly assigning to treatments

New cards

What is categorical data?

data is placed into one of several groups of categories
- ex. favorite color, car model, zip code
- appropriate graphs: bar graph, segmented bar graph, mosaic plot, pie chart

New cards

What is quantitative data?

data is numerical and it makes sense to average the values
- ex. height, length, age, # of siblings, # of pairs of shoes you own
  - appropriate graphs: dotplot, stemplot, histogram, box plot

New cards

What are the characteristics of the mean?

\overline{x}=\frac{\sum_{}^{}x_{i}^{^{}}}{n}

add up all the values in the set, then divide by the number of values
heavily influenced by outliers

New cards

What are the characteristics of the median?

the middle value of an ordered distribution
not influenced by outliers

New cards

What are the characteristics of range?

Max - Min
Highly influenced by outliers

New cards

What are the characteristics of the Interquartile Range (IQR)?

Q3 - Q1
measures how wide the middle 50% of the data is
not influenced by outliers, b/c only values toward the middle of the distribution are used in the calculation

New cards

How do you find outlier(s)?

Do the upper and lower fence test
- LF = Q1 - 1.5(IQR)
  - x < LF = outlier
- UF = Q3 + 1.5(IQR)
  - x > UF = outlier

New cards

What is the five-number summary?

min
Q1
median
Q3
max

New cards

What are the characteristics of standard deviation?

s=\sqrt{\frac{1}{n-1}\sum_{}^{}\left(x_{i}-\overline{x}\right)^2}

the on average amount of deviation from the mean
since it’s related to the mean, it is heavily influenced by outliers

New cards

What does skewed left mean?

long tail on the left

New cards

What does skewed right mean?

long tail on the right

New cards

Characteristics of a stem plot

you can tell the shape of a graph
easily find the middle
can find the unusual features
can easily find the spread

New cards

Characteristics of a histogram

can’t easily find the middle
can find the spread
can find unusual features
can find the shape

New cards

How do describe/ compare a data set?

CUSS + CONTEXT

center - median/mean (based on presence of outliers)
unusual features - gaps/outliers
spread - range/IQR/standard deviation (based on presence of outliers)
shape - unimodal/skewed/symmetric
context of the problem

NOTE: when comparing, use words like: greater than, less than, equivalent

New cards

How do you create a relative cumulative frequency graph?

make the graph from 0%-100%
take the values from the relative frequency and then add them in order
- rf1: 36%, rf2: 29%
- rcf1: 36%, rcf: 65%

New cards

What is the important aspect to remember about relative cumulative frequency tables?

Whatever value is chosen, the amount is up to that value
- ex. How many automobiles have a score less than 180?
  - 93%, the leftover 7% is over 180.

New cards

What are the characteristics of z-scores?

z=\frac{x_{i}-\overline{x}}{s}

measure of relative position
tells you how many standard deviations above or below the mean a value is
explanation: (context) is z-score standard deviations above/below the mean score (context).
- when comparing z-scores, whoever has the highest generally did better

New cards

What are the characteristics of a scatterplot?

shows the relationship btwn. two quantitative variables

New cards

How do you describe the characteristics of a linear relationship?

DUST + CONTEXT

Direction - positive/negative
Unusual features - outliers (points that fall outside of the line of data)
Strength
- strong - data points fall close to the line
- weak - data points are very spread out around the line
Type - linear, nonlinear
context of the problem

New cards

What are the characteristics of the correlation coefficient?

r only measures the strength of a linear relationship - between two quantitative variables
measured between -1< r < 1 (can equal -1 or 1)
- if r is near zero = relationship is weak
- if r is near one = relationship is strong
r has no units

New cards

What are the basic characteristics of the least squares regression line (LSRL)?

\displaylines{\overline{y}=a+bx\\ }

“y hat” = predicted y value
the line of best fit that minimizes the sum of the squares of the residuals

New cards

What happens when you predict (“y hat”) with the LSRL?

Interpolating
- when the x-value is inside the domain of the data
Extrapolating
- when the s-value is outside the domain of the data

New cards

How do you interpret the slope/“b” of the LSRL?

As (x in context) increases by 1 (unit), the predicted (y in context) increases/decreases by (b).

New cards

How do you interpret the intercept/“a” of the LSRL?

When (x in context) is zero, the predicted (y in context) is (a).

New cards

How do you interpret the correlation/ r?

You must address:

strength - moderate/weak/strong
type - linear/nonlinear
direction - positive/negative
context

New cards

How do you characterize the coefficient of determination/ r²?

calculated by squaring the correlation
Interpretation: r²% of the variation in (y in context) is attributed to the linear relationship with (x in context).

New cards

What are the characteristics of a residual?

\varepsilon=y-\overline{y}

residual = actual - predicted

the vertical distances between each point on a scatterplot and the LSRL

New cards

How do you interpret residual plots?

no obvious pattern = this model is appropriate
clear/obvious pattern = this model isn’t appropriate
- fanning is also a bad thing

New cards