whether you're looking to learn AP Statistics from start to finish or you just want to review you've come to the
right place this video will cover all the content in AP Statistics now it may
seem impossible to learn an entire course in one video but it's actually not that difficult AP Statistics doesn't
involve a lot of memory and a lot of the concepts are very similar so just remembering one of them will make it a
lot easier to remember all the rest if you're just looking for one specific topic here is the video framework to
help you find it let's Dive Right into chapter number one
at the start of each chapter I will show you an overview that lets you know what each section covers and which sections
are the most important and why and I will also show you the equivalent units and AP stats and which sections
cover them as well as the approximate weights of those units on the AP exam
and Regis means that the section is not part of the current chapter knowing the overview and the framework
will help you know which topics are the most important as you learn AP Statistics in one video
if you would like to take a look at the overview pause right here but anyway we're going to jump right
into chapter 1 section one statistical studies let's first Define statistics statistics
deals with collecting and organizing data and using data from a sample to make generalizations to a population I
haven't defined sample and population for you yet but let's suppose that you wanted to answer a question involving
all people of an entire country it would be almost impossible to collect data from everyone because that would be so
many people so what you can instead do is collect data from a number of people in that country and then use that data
to make generalizations to all people of the entire country that's basically what
statistics is and a statistical study is the process of collecting data and then making
inferences in order to answer a question
so here are some key terms that will show up on the AP exam and also later in
this video a population is a large group of people or other things and this is
generally what you're trying to answer a question about a sample is a subset of a population
that is studied in a statistical study a statistical unit is a member of the
sample a population parameter is a number describing the population and keep in
mind that this has to describe a population if you want something to describe a sample that would be called a
descriptive statistic we're going to dive more into that in section three
and a subject is just a human unit so that's all the content from this
section that is needed for future sections the rest of the content in this section is still important so you should
still watch it however I would suggest not rewinding and re-watching too many times
the next topic is the types of statistical studies in an observational study the person conducting the study
observes characteristics or behaviors without trying to affect them or interfering in what they're studying so
a survey would be an example of an observational study because let's
suppose that you ask people how much water they drink yesterday you're just asking them and you're not in any way
affecting how much water that they drink yesterday an experiment is different because the
person conducting the study takes a role in what's happening in the study so they do that by assigning units to different
treatments and observing the effects of each treatment an example of a survey is
randomly telling people whether to drink lots of or little water and then observing their running speed in a race
and the key word is randomly you must assign people to treatments randomly
because if you told the people who are already fast to drink lots of water and
the slower people to drink little water and your results show that the people drink lots of water have higher running
speeds you can't infer whether drinking lots of or little water makes any difference to running speed because you
don't know if the faster speeds were due to those people drinking lots of water or because they were already faster
now we're getting into the types of variables you've seen a lot of this in previous math classes the explanatory or
independent variable is the variable that is adjusted an example would be how much water the racers drink this
variable is generally plotted on the x-axis the response or dependent variable is
the variable that is measured an example would be the running speed this is generally plotted on the y-axis
a confounding or lurking variable is a variable that is unaccounted for and
affects both the explanatory and response variables an example would be how much food is eaten because people
who eat more food will probably drink more water to go along with it and how much food you eat might affect your
running speed it's important to account for confounding variables and experiments because your goal in experiments like
these is probably to measure whether or not the response variable changes as the explanatory variable is adjusted and if
you find that there is a change but you haven't accounted for confounding variables then you don't know if those
changes are because of adjusting the explanatory variable itself or because there's a confounding variable that's
affecting both the explanatory and response variables so that brings us into the next topic
control which is accounting for confounding variables one common method of control is to have
a control group which is the group that receives no special treatments so let's suppose that you're testing whether or
not a certain medication has an effect you can randomly assign people to either
the control group which doesn't take the medicine and the treatment Group which does and you can measure whether or not
there tends to be a difference between the control group and the treatment group and with that you can determine
whether or not the medicine causes better results a control variable is a variable that is
held constant in order to prevent it from becoming a confounding variable having control variables can be
restrictive because all of your units must satisfy these control variables and
if they can't then they can't be part of your study so one alternative to having control
variables is to make them grouping variables and that gets us into the next topic of experimental designs the most
basic design is a completely randomized design in which units are assigned to treatments at random and not grouped
based on any characteristics in a randomized block design units with
similar characteristics are grouped into blocks and within each block units are assigned to treatments at random so
let's suppose that you have a study and you think that it's possible that gender
would be a confounding variable because males and females might respond differently to a certain treatment so if
you don't want to make gender a control variable because that would force all your units to be the same gender what
you can instead do is make gender a grouping variable and then within each
block you would determine whether the treatment group has a significantly different result from the control group
to reach your final results a matched pair's design is a design in
which units with similar characteristics are paired and one unit in each pair is assigned to a treatment and the other is
assigned to the control group an example of a pairing variable is age so we can
suppose that somebody who's 25 years old is paired with somebody who's 26 years
old and a difference of one year is hardly anything so essentially for this
pair the age is effectively constant and within each pair you determine whether
or not there was a significantly different result for the unit that went to the treatment group and the unit that
went to the control group and by determining this for all pairs you can determine whether or not the treatment
has a significant effect the next topic is blinding a
double-blind experiment does not let the subjects know their treatment group and does not let The Observers know which
treatment group the subjects are in not letting the subjects know their treatment group is useful because
otherwise the subject's expectations can influence the results for example
suppose a certain pill is falsely believed to improve running speed and races in that case if the people knew
whether or not they were given the treatment then the people who were given the treatment might falsely believe that
they have a better chance of doing well in the race and therefore have motivation and try harder in the race
and therefore finish faster but that was because of the subject's
expectations and not because of the treatment itself so not letting the subjects know their treatment group
prevents expectations from influencing the results not letting The Observers know what
treatment group the subjects are in that could prevent the observer's expectations from influencing the
results for example if The Observers knew which subjects were in the treatment group and which were in the
control group then they might have higher expectations for people in the treatment group to be able to finish
faster in a race and therefore cheer for them more which could lead to them finishing faster but that was because of
the the observer's expectations instead of the treatment itself a single blind experiment is one of the
above but not both and the placebo is a pill or medicine that has no effect the experimenters
often give this to the control group and don't let them know that the medicine has no effect this ensures that the
subjects do not know their treatment group and a placebo can lead to something called the placebo effect where the
placebo which is supposed to be inactive actually causes an effect because of the
subject's expectations so now we're getting into correlation versus causation and this ties back to
the types of variables so correlation is a positive or negative Trend between two
variables for example shark attacks tend to increase as ice cream sales increase this is positive correlation because as
one increases the other increases if as one increases the other decreases then
this would be called negative correlation and causation means that adjusting one
variable causes changes in the other variable for example taking a medicine
speeds up recovery from a disease being able to distinguish between correlation
and causation is important because then you can reach the results of whether adjusting one variable would affect the
other so going back to the first example we can ask if we put a limit on how much
ice cream stores can sell would that reduce the number of shark attacks the answer is no because ice cream sales
increasing does not cause shark attacks to increase there is a confounding
variable and the confounding variable in this case is weather if it gets hotter
then people are more likely to go to the beach which would increase the number of shark attacks and if it's hotter people
would also be more likely to buy ice cream foreign so experiments allow you to
infer causation because they generally account for confounding variables and observational studies do not allow you
to infer causation however in some cases experiments are
not ethical for example many years ago there was a cholera
outbreak cholera is a disease and some people hypothesize that cholera is
spread through contaminated water in this case it will not be ethical to
conduct an experiment to prove or disprove that because then you would have to give some people contaminated
water and that's not ethical here are some miscellaneous terms replication is
being able to repeatedly receive consistent results in order to ensure the results are correct so in general
you want to ensure that your sample size is sufficiently large so that your results are consistent and not just
highly influenced by chance and this is consistent with the law of large numbers which states that the more
you repeat something the closer experimental results will be to expected results
a census is observing or serving an entire population and this means that
you do not need to do statistical inference because you've already answered the question about the population
so I know that section was pretty long but the rest of the sections in this chapter are pretty short this next
section will be on sampling and bias there are five sampling methods simple
random sampling systematic sampling stratified random sampling cluster sampling and convenience sampling
simple random sampling is selecting a completely random sample from the population simple random sampling has
the properties that every possible sample from the population is equally likely to be selected and every unit in
the population is equally likely to be sampled the pros of this are that it's
unbiased and most ideal for statistical inference because as you'll learn in statistical inference you consider all
possible samples from the population but the cons are that this requires the list of all population members because every
unit has a chance of being sampled so you must account for each and every unit and then an example of this would be
choosing nine manufactured items at random this is feasible because after an
item is manufactured it's in your hands so essentially you have the list of all population members and can choose any of
them systematic sampling is starting at a random point and then sampling every nth
unit after that starting point where you decide what n is the pros of this are that it's easier than simple random
sampling and it's usually a pretty good approximation of a simple random sample the con is if patterns exist for example
if every fourth unit in line shared a common characteristic then the results can be biased
stratified random sampling is dividing the population into several groups or
strata based on common characteristics and then sampling randomly from each group the pros of this are that IT
addresses all groups and it's unbiased because you sample randomly from each group The cons are that it requires a
list of all population members because every unit of every group has a chance
of being sampled cluster sampling is dividing all units into clusters and then choosing one or
more clusters to sample all units from the pros are that this is usually simple and can be unbiased especially if the
Clusters are diverse the con is that it's biased if the cluster is chosen don't represent the population one
common factor that makes cluster sampling easy is geographic location because it's easier to sample all 1000
people from one city instead of choosing a thousand cities to sample one person from each
convenient sampling is sampling the units that are the most convenient to sample for example your friends or
people whose phone numbers that you have this is the simplest method and is suitable for simple and small studies
but the con is that this is usually the most biased method so usually you should not use this method
you don't need to know about the pros and cons for the AP exam you just need to be able to identify whenever each
method is being used now we are getting into the types of bias sampling bias is when not all
members of the population are equally likely to be sampled this could be a problem even in effective sampling
strategies such as stratified random sampling because if you sample the same
number of people from each strata but the strata are not of same sizes then people from smaller strata are more
likely to be sampled so you need to make sure that the number of people that you sample from each strata is proportional
to the size of that strata under coverage bias is when not all groups in the population are covered by
the sample this is generally not a problem in stratified random sampling for simple random sampling this is
generally not a problem for large sample sizes because large sample sizes are more likely to cover all the groups
response bias is when the responses to a survey are not accurate
and non-response bias is bias caused by people not responding to a survey voluntary response bias is caused by
people not responding to a voluntary survey when you send out a voluntary survey one thing that you might observe
is that the people who respond to it are generally the ones with the strongest opinions on the issue the reason is
because the other people might not think it's worth their time to fill out the survey so this is one thing that you
need to consider when sending out a voluntary survey now we are getting into section three
this section will mostly be reviewed but there will still be some new Concepts let's start off by talking about
histograms on the horizontal axis there are several ranges in this case ec021
one to two two to three and so on several different things can go on the vertical axis in this case you see
frequency that means that 14 car trips in the last month were between zero and one hours five were between one and two
and so on you can also have relative frequency which is number of observations in the
specified range divided by the total number of observations in this case there are 25 total observations 14 of
which were between 0 and 1 hours so dividing 14 by 25 gives 0.56 as a
relative frequency and you can do this for the rest of the ranges now this is something that you haven't
learned about in previous math classes density histograms the density is relative frequency divided by the bin
width or the size of the range is in this case the density histogram looks the same as the relative frequency
histogram that you saw in the previous slide because the bin widths are all one
however if we change the bin width to 0.5 you see that the density histogram
will look different and one property that density histograms all have is that the total area of the
rectangles is always one a probability density function is a
smoothed out density histogram and this also always has the property that the area under the curve is one you can make
a probability density function using a histogram of any bandwidth size but in general it shouldn't be too small or too
large if it's too large then it won't be very accurate and if it's too small there will be a lot of spikes unless you
have a very high number of observations a Dot Plot has numbers along the
horizontal axis and above each number there is a certain number of dots that represents how many times that
observation shows up so far we've been dealing with data visualizations for numerical data where
all the observations are numbers if we want to deal with categorical data or all the observations are different
categories one data visualization that we can use is bar charts where the
horizontal axis represents all the categories and then on the vertical axis you can have frequency or relative
frequency now if you had two categorical variables then you can represent the second one
using colors on the bar chart you haven't seen Mosaic plots in
previous math classes the way this works is the width of each bar is proportional
to the frequency of that category so you can see from the bar chart that approximately twice as many people pick
blue as red therefore the width of blue should be approximately twice the width
of red and then the total height of each bar will always reach one on the
vertical axis and the vertical axis is relative frequency you can interpret a
relative frequency as of the people who picked red approximately 75 percent or
male of the people who picked yellow approximately 50 were male and so on if
you're confused with Mosaic plots don't worry because they don't show up often on the AP exam and when they do they
usually explain it pretty well a stem and leaf plot has stems on the left side and leaves on the right each Leaf
represents the ones digit of one individual observation and the stems are the digits in front of the ones place
a scatter plot is a way of visualizing data for two numerical variables usually
the explanatory variable is on the x-axis and the response is on the Y and each point in the scatter plot
represents one observation a pie chart is a way of visualizing data
for categorical variables where the angle of each sector is proportional to
the frequency and usually pie charts aren't used because humans aren't that good at
seeing angles so usually they're substituted for relative frequency bar charts just make sure that the vertical
axis starts at zero now we're getting into descriptive statistics which summarize the results
of a study and I've divided them into two groups here and you'll see why soon
you've learned about the descriptive statistics in the first group in previous math classes they mostly deal
with percentiles the only new thing is that nap stats outliers are not considered when determining the minimum
and maximum so an outlier is a value that differs significantly from most other
observations there are many ways to decide what constitutes an outlier but generally a value that is less than the
lower quartile minus 1.5 times the interquartile range or greater than the
upper quartile plus 1.5 times the interquartile range is considered an
outlier so you've seen box spots in previous math classes and this diagram shows where the outliers lie notice how
the minimum and the maximum both do not line the outlier section if you have outliers they would be their
own individual points outside of the box plot this is how you'll see box plots in AP stats and this is different from
previous math classes now into group 2 descriptive statistics the mean is computed by adding up all
the observed values and dividing by the number of observed values the variance is 1 divided by n minus 1 times the sum
of the squares of the differences between the observed values and the mean a high variance means that the observed
values tend to fall far away from the mean and the standard deviation is the
square root of the variance the range is the max minus the Min and this time it includes outliers
a descriptive statistic describes a sample while a population parameter describes a population you should
remember the variables in this table because we're going to be using them
what's the best way to describe a population the answer is you should use Group 1 population parameters when the
population is skewed because they're robust or less affected by outliers since population parameters should
represent the overall population they shouldn't be influenced by extreme individual values
this diagram right here shows what would happen in a right skewed population the
mode or the most frequently occurring value would be lower than the median and the mean because the highest point in
the distribution is on the left the median would be next and then the mean
would be the highest one because it is influenced by those extremely high values
and then use group two population parameters when the population is approximately symmetrical in this case
the mean and the median are pretty much the same now we're getting into the final section
of this chapter let's first talk about discrete and continuous variables discrete variables
only take on countable values examples would be the number of flowers in the backyard or the number of stars in the
Milky Way you can't have half of a flower or a star for example you can only have 0 1 2
3 4 and so on the distribution that you see here is an example of a binomial distribution which
we'll learn more about in chapter 3. continuous variables can take on any
value within a certain range an example would be the liters of gas in a tank The
Continuous distribution that you see here is an example of a sampling distribution and this is useful because
it allows us to compare a sample that we clicked with all possible samples from the population and that allows us to do
statistical inference which is what we'll be learning about in chapter 2.
distribution shapes are described by the skewness and the number of Peaks a
uniform distribution is roughly horizontal so for every x value the Y
value is approximately the same the distribution that's skewed left or right would have a longer left and right tail
respectively and a distribution can also be symmetrical
to describe the number of Peaks you can use the terms unimodal bimodal bimodal
and so on a normal distribution is bell curve shaped so it is unit modal and
symmetrical the empirical rule is not something that you have to remember but it is pretty
useful it tells you approximately what percentage of values fall within a
certain number of standard deviations of the mean in a normal distribution
the Z statistic tells you how many standard deviations away a value is from the mean and it is negative if the value
is lower than the mean now we are getting into the most important part of this chapter using the
graphing calculator if you don't have a graphing calculator in the description there will be a link to a video that
will show you how you can access an online one which is what I'll be using so hit second distribution
the first important function that you'll need to know about is normalcdf or cumulative density function this will
give you the area under the curve of the normal distribution which is very useful if you want to determine What proportion
of values fall within a certain range so let's suppose we wanted to know what
proportion of values fall within one standard deviation of the mean that corresponds to a z statistic of between
negative one and one so for lower type negative one and for
upper type 1 and it gives approximately 0.68 and
you'll notice that this is consistent with the empirical rule we can also change the mean and standard
deviation so let's change the mean to 25 and the standard deviation to 5
now suppose we want to know what proportion of values fall within two standard deviations of the mean so for
lower we would type 15 which is two standard deviations below the mean
and for upper we would type 35 which is two standard deviations above the mean
and it returns approximately 0.95 which is also consistent with the empirical
rule now suppose that we did not want two tails so we just want to know what
proportional values fall below 35 and we don't want a lower Bound in that case we
would type in a really low number for the lower bound say negative 10 to the power of 99
and then it would give approximately 0.977
and if you wanted a right tail only so you only wanted to know the proportion of values that fall above a certain
value then you would type a really high number say 10 to the power of 99 for
upper now the second important function is inverse Norm this function takes an area
as input and Returns the proper tail or tails such that the area above below or
between the Tails is equal to the area that you input
so let's suppose that we want to find the median in that case
the area below the left tail should be 0.5 and if we just set the mean to zero and
the standard deviation to one we'll find that the result is zero so in
the normal distribution the median is always equal to the mean now let's suppose instead we wanted the
25th percentile then we would type 0.25
and then the result would be approximately negative 0.67
and then if we instead wanted the area between two tails
so let's suppose we wanted the area between the two tails to be 0.997
we we get approximately negative three and three and this is consistent with
the empirical rule we're into our last topic for this chapter describing distributions on frq
questions if you're asked to describe or compare distributions for full credit you need to describe the shape Center
and variability you also need to provide context and if asked make comparisons
so for the shape for both regions A and B the distribution of lead
concentrations is skewed right for Center you need to find the median or the mean median is easier for each and a
you can find that there are 50 samples in total so you need to find where the 25th 26 from the left lies and you can
see that there are 20 samples between 0 and 50 and the 25th 26 is between 50 and
100. so the median lead concentration for region a is between 50 and 100 Which
is less than the median lead concentration for region B which is between 100 and 150. for variability you
need to find the IQR or the range range is easier region a the range is between
450 minus 50 and 500 minus zero so the
range of lead concentrations is between 400 and 500 which is greater than the
range of lead concentrations for region B which is between 250 and 350.
so I satisfied the condition of providing context because I use the term lead concentrations and I also made
comparisons by using greater than or less than so here is model solution and scoring
now we're done with chapter one and into our review section one may have felt difficult because of the amount of
vocabulary terms but it's oftentimes easier if you can see all the vocab terms on one page which is listed right
here pause here if you need to look over the list of vocab terms
and then for section two here are all the sampling methods
and then here are the diagrams for the types of bias and for section three here
are all the data visualizations with the important ones in bold
here are examples of the two new data visualizations you learned today and below that are some descriptive
statistics and for section four you need to know the difference between discrete and
continuous and some important skills listed below
foreign to chapter 2 which is probably the most
fun chapter of this video here is an overview pause if you like to look over it
and here is the AP stats framework chapter 2 Section 1 is probably going to
be the most difficult section of this video because you'll learn about two new major Concepts confidence intervals and
hypothesis testing but don't worry because future sections will also use these two concepts giving you an
opportunity to practice with them and when we do the review of chapter two it
will help you remember them even better so here's an example of a confidence
interval problem a principal wants to know what proportion of the 2000 students at his high school exercise at
least 150 minutes each week he randomly selects 100 students and asks them that question the proportion of students
answering yes is 0.7 construct and interpret a 95 confidence interval for
the true proportion of students at the high school that exercise at least 150 minutes each week
to build confidence intervals we'll be using the sampling distribution which is the distribution of all possible samples
of a specified size from the population some important properties of this
distribution are that the mean is equal to the true population parameter of
proportion and it's approximately normal when the number of successes and failures are
both at least 10. a 95 confidence interval means that we
want 95 of all samples to generate a confidence interval that contains the
true population parameter and we know from the empirical rule that in a normal distribution 95 of all
samples fall within 1.96 standard deviations of the mean
so this gives us the procedure to make confidence intervals the confidence interval is the point estimate plus or
minus the margin of error and the margin of error is the critical value which is 1.96 for a 95 confidence interval times
the standard deviation of the sampling distribution which can be given by the formula square root of P times 1 minus P
all over n now we don't know what p is that's the true population parameter but we can
estimate it by using p hat which is what proportion we get for our sample
so this is the formula for the confidence interval so require conditions for constructing
the confidence interval are that the sample is random or the data is from a randomized experiment
the sampling distribution is normal and when you're sampling without replacement which means that after
somebody is sampled they can't be sampled again there's also the 10 percent rule which is that no more than
10 percent of the population is sampled the reason is because the sampling distribution is actually the sampling
distribution for all samples with replacement and when you're sampling without replacement the standard
deviation is actually less than the square root of P times 1 minus P Over N but if no more than 10 percent of the
population is sampled the difference is negligible so the 10 rule ensures that
the standard deviation is approximately accurate interpreting the confidence interval so
one way to interpret it is we are C percent which is usually 95 percent confident that the true population
proportion is between the lower bound and the upper bound and another correct way is about C percent of all samples of
the specified size from the population will produce a confidence interval containing the true population
proportion both of these interpretations are correct but the first one is more common because it actually reports the
bounds of our confidence interval here are some examples of incorrect interpretations and the red text shows
where it is incorrect an additional note is that for any
sample size the maximum possible standard deviation of the sampling distribution is square root of 0.5 Over
N which occurs when P equals 0.5 so some questions on the AP exam may ask you
what's the maximum possible standard deviation of the sampling distribution so this is where this fat comes in handy
so as you can see confidence intervals aren't that difficult there are really only three steps what might be difficult
is the specifics of each step fortunately many of the formulas that
you've seen show up on the formula sheet for the AP exam and you can use graphing calculators to construct confidence
intervals for you which I'll show you how to do later so now revisiting the problem our first
step is to check the required conditions so we do indeed have a random sample the
sampling distribution is approximately normal because 100 times 0.7 and 100
times 1 minus 0.7 are both at least 10 and the 10 percent rule is also
satisfied step two is to construct the confidence interval so our Point estimate was 0.7 and then
plus or minus 1.96 times the square root of 0.7 times 1 minus 0.7 over 100 in the
end we'll get approximately 0.610 to 0.790 and it's usually best to round to
three decimal places so to interpret the confidence interval the best interpretation is we are 95
confident that the true proportion of students at the high school that exercise at least 150 minutes each week
is between 0.610 and 0.790 so now we're into hypothesis testing
we'll always have a null and an alternative hypothesis the null hypothesis must contain the equal sign
and it's usually the equal sign itself the alternative hypothesis could be not
equal to less than or greater than so with hypothesis testing we collect
the sample and find the probability of seeing a sample at least as Extreme as our sample assuming that the null
hypothesis is true and if the probability is low it means that a result would be unusual if the null
hypothesis is true which means that the null hypothesis is probably false so we
reject the null hypothesis and accept the alternative hypothesis so there are four steps to hypothesis
testing the first step is to set up the test and check the required conditions in a word problem where the null and
alternative hypotheses aren't given you set them up yourself the required conditions are the same as
for confidence intervals the only difference is instead of P hat you use the P naught value that was given to you
in the null hypothesis since you're assuming that it's true the next step is to obtain a z statistic
and once again since you're assuming that the null hypothesis is true you use P naught to get the mean and standard
deviation of the sampling distribution and if you remember the Z statistic tells you how many standard deviations
from the mean a value is so this is the formula for the Z statistic
then we obtain a p-value using the Z distribution which has mean 0 and
standard deviation one if the alternative hypothesis is less than then the p-value is the area to the
left of your Z statistic if it's greater than then it's the area to the right
and that the alternative hypothesis is not equal to then the p-value is the area to the left of negative absolute
value Z Plus the area to the right of absolute value Z which is also twice the
area of the smaller Tail Of Z which in this case is the right tail because Z is positive
finally obtain the results the p-value means that if the null hypothesis is
true then the probability of obtaining the test statistic at least as Extreme as what was obtained is the p-value the
p-value is not the probability that the null hypothesis is true that's a common mistake that people make
if the p-value is less than or equal to the level of significance which is usually 0.05 reject the null hypothesis
we have statistically convincing evidence that the alternative hypothesis is true otherwise fail to reject the
null hypothesis we do not have statistically convincing evidence
so here are the four steps of hypothesis testing now here's an example problem this is
the same problem as earlier but now the proportion is 0.6 and it's asking us is
there statistically convincing evidence at the level of alpha equals 0.05 that
more than half of all students at the high school exercise at least 150 minutes each week
so the first step is to set up the test and check the required conditions the null hypothesis is p equals 0.5 since
it's asking whether more than half and the alternative hypothesis is greater since it's asking more than half
and the required conditions are all satisfying then we obtain a z statistic and
remember to use 0.5 and not 0.6 to determine the standard deviation of the
sampling distribution so Z equals two and then the p-value is the area to the
right of Z equals 2 of the normal distribution with mean 0 and standard
deviation one and you can determine the area of 0.023 and since this is less than the level of
0.05 we reject the null hypothesis we have statistically convincing evidence
that more than half of all students at the high school exercise at least 150 minutes each week
so there are two types of errors that you can make in hypothesis testing a type 1 error also known as a false
positive is rejecting a true null hypothesis a type 2 error also known as
a false negative is failing to reject a false null hypothesis the level significance Alpha is the
probability of a type 1 error if the null hypothesis is true why because if
the null hypothesis is true then the null distribution is the same as the population sampling distribution
and the rejection region which is the region of all Z statistics that would
result in a p-value of less than Alpha as area equal to Alpha therefore the
area of obtaining a z statistic in the rejection region and therefore incorrectly rejecting the null
hypothesis is Alpha power is 1 minus beta and beta is the
probability of a type 2 error this seems confusing but one way to remember it is
if a test is more powerful then it's more likely to make discoveries by correctly rejecting false null
hypotheses therefore as power increases the probability of a type 2 error should
decrease and beta decreases when any of the following occur I recommend not
remembering this list because it's pretty intuitive you can determine the probability of a
type 2 error from a graph so suppose the null hypothesis is false
and the actual population distribution and null distribution are shown right here we would have a type 2 error if we
fail to reject the null hypothesis which means we obtain a test statistic that
does not fall in the rejection region and the probability of obtaining a test
statistic that does not fall in the rejection region from our population is
shaded in yellow so therefore the yellow area is the probability of a type 2
error now we're getting into the sampling distribution for difference of two proportions so in general if you
have two independent variables X and Y then the formulas for the mean and
standard deviation of the distribution ax plus b y are shown right here you do
not need to remember these formulas and if you just want x minus y then you
can get that the mean of that distribution is the mean of x minus the mean of Y and the standard deviation is
the square root of standard deviation x squared plus standard deviation y squared
and then now you can get the standard deviation of the sampling distribution
for D hat one minus P hat two and the standard error
so this is a confidence interval for the difference of two proportions it's the point estimate P hat one minus P hat two
plus or minus the critical value times our standard error
the required conditions are mostly the same as for one sample just check them for both samples the only new condition
is that the two samples must be independent and the steps are the same as for one
sample so here's an example problem we're going back to the first problem we saw in
chapter two and now we're asking at the different high school with 1 500 students the same question is asked to
30 random students the proportion of students and Screen yes is 0.6 construct
and interpret a 99 confidence interval for the difference in population proportions
so all the required conditions are satisfied the two samples are independent because they come from
different populations knowing information about one sample is not going to tell you anything about the other
and then we construct the confidence interval so our Point estimate is 0.7
minus 0.6 which is 0.1 plus or minus the critical value which
can be determined using inverse Norm times the square root of 0.7 times 1
minus 0.7 over 100 plus 0.6 times 1 minus 0.6 over 30 and in the end you're
going to get Negative 0.159 to 0.359
so now we interpret the confidence interval which is pretty simple
and now we're getting into hypothesis testing for difference of two proportions so we change the problem that we just
saw to is there statistically convincing evidence at the level of alpha equals 0.05 that the proportion of students who
exercise at least 150 minutes each week is different for other two schools
so the null hypothesis is that P1 minus P2 equals zero in other words they're
the same and the alternative hypothesis is not equal to because the question was asking are they different
and in AP Statistics whenever you're tested on hypothesis testing for difference of two proportions
you will always have that the null hypothesis is that the difference is zero because if it's not then the
problem is significantly more difficult and you'll see why soon so could we still use the same formula
for standard error as we used in confidence intervals for a difference of two proportions the answer is no because
we must assume that the null hypothesis is true but since P hat one and P hat2
are likely not equal if we use this formula it contradicts the null hypothesis
so we replace P hat one and P hat two with some common value how do we get
this common value the answer is we can combine our two samples into one fold
sample since we know the proportion sample size for each sample we can determine the
number of successes for each sample and we add together the number of successes and the sample sizes to get our pulled
sample and with our pulled sample we can get the proportion so this is the common value that we need
to replace P hat one and P hat2 with in our formula to get the formula for the
standard error for hypothesis testing so now in our formula we can make it
slightly simpler by factoring out P hat c times 1 minus P hat C and P hat C is
the total number of successes over the total sample size
so the conditions for hypothesis testing for a difference of two proportions they're the same as for confidence
intervals but now to check normality you use P hat C
so here are the steps for hypothesis testing for difference in two proportions they're the same as for one
proportion so now revisiting the problem the first
step is we set up the test which we did earlier and we checked the conditions they're all satisfied
and then we obtain a z statistic using the formula that we just saw earlier for
our standard error all right and then we'll get that Z is
approximately 1.027 and then the p-value is 2 times
normalcdf with the lower bound 1.027 and upper bound 10 to the power of
99 and that will give you 0.304
and since that is greater than the level of 0.05 we fail to reject the null
hypothesis we do not have statistically convincing evidence that the proportion of students who exercise at least 150
minutes each week is different for the two schools and that's it for section one if you're
still struggling with confidence intervals in hypothesis testing don't worry because section 2 will focus a lot
on those as well the only new topic is T distributions
so if you want to infer about the population mean for a numerical variable
then the sampling distribution for population mean will have mean equal to
the population mean for that variable and standard deviation equal to the
population standard deviation divided by the square root of the sample size and
since you know don't know what the population standard deviation is you can
estimate it using the standard error which is the standard deviation of the
sample which is the standard deviation of the sample divided by the sample size
and there are two ways for the sampling distribution to be approximately normal the first is the central limit theorem
which states that if the sample size is sufficiently large usually at least 30
then the sampling distribution is approximately normal even if the population distribution is skewed the
second way is if the population distribution is approximately normal then the sampling distribution is
approximately normal and you can check this by seeing if your sample is
approximately normal and free from extreme outliers
so you use the T distribution for population mean when the population
standard deviation is unknown and the Z distribution when it's known but this is rare because the fact that you're
collecting a sample means that you don't know about the population so the T distribution for various
degrees of freedom which is one of its parameters is shown right here and if degrees of freedom equals infinity then
the T distribution becomes the same as the Z distribution so how do we generate the T distribution
well suppose we have a population which means zero and standard deviation 15 and
we collect three samples of size nine the T statistic for each sample is
calculated by finding the difference between the sample mean and the actual mean divided by the standard error which
we get by using the standard deviation from our sample and the Z statistic is the sample mean minus the actual mean
divided by the standard deviation of the sampling distribution which is determined by using the standard
deviation of 15. so we can calculate T and Z statistics for each of these samples
and then if you click more and more samples and distribute the T statistics and Z statistics then the T statistics
will form the t-distribution and the Z statistics will form the Z distribution so statisticians have done this with a
large number of samples and that's how they form the Z and T distribution so here are some comparisons between the
Z and T distribution they're both bell curve shaped symmetric and have Amino zero but the T distribution has a lower
Center and higher tails the reason is because some samples will have a low standard deviation resulting in a really
positive or really negative T statistic and the T distribution has a parameter
called degrees of freedom and as degrees of freedom increases the T distribution approaches the Z distribution
so here is the formula for confidence interval for population mean it's the point estimate which is the sample mean
plus or minus the margin of error which is the critical T value times the standard error to find the critical T
value use inverse T which unlike inverse Norm does not allow you to choose left
Center or right tail it only accepts the left tail and to find the area of the left tail it's just 1 minus 0.5 times
the confidence level as a decimal or just the confidence level as a percentage divide by a hundred and the
degrees of freedom is n minus 1. so the required conditions are mostly the same as for proportions the only
difference is to check normality you check if n is greater than or equal to 30 or the population distribution is
approximately normal so here is an example problem so we're going back to our high school but now
instead of finding the proportion of students that exercise at least 150 minutes each week we want to find the
mean exercise time so we obtain the sample of 30 students and get a mean of
140 minutes and a standard deviation of 70 minutes construct and interpret a 95
confidence interval for the true average time that the students at the high school exercise each week
all right so all the required conditions are satisfied the sample size is 30 which is great enough for the central
limit theorem to fold so now constructing the confidence interval the critical T value is negative inverse
t with area 0.2025 and degrees of freedom 29
and then plug in all the values that we obtained in the end we're going to get approximately
113.862 to 166.138 and then interpreting the
confidence interval is pretty self-explanatory and then for hypothesis testing the
steps are the same as four proportions but then now we're obtaining a t statistic instead of a z statistic
so now we're asking is there statistically convincing evidence at the level of alpha equals 0.1 that the
average time that the students at the high school exercise each week is greater than two hours
so we set up the test two hours is the same as 120 minutes so we're testing mu
equals 120 Minutes versus mu is greater than 120 minutes and the required
conditions are satisfied and then now we obtain a t statistic
it's the difference between our sample mean and the hypothesized mean divided by the standard error
it's approximately 1.565 and then since the alternative hypothesis was greater we find the area
of the right tail which is approximately 0.064
and since 0.064 is less than the level of 0.1 we reject the null hypothesis
so now we're getting into difference of two means and depending on how our data is structured there are two ways to
perform statistical inference for this if the data is from a matched pair samples which if you remember means that
everyone in One sample is paired with somebody in the other sample then we can think of it as one sample of pairs where
each value is the difference of a pair in the Matched pair sample then proceed using the one sample T interval or test
this sounds confusing but I'll show you what it means on the next slide and if the data is from two independent samples
they must be independent then proceed using the two sample T interval test
so if you have a matched pairs sample then for each pair you compute the
difference and then with this sample of differences you can then perform a one sample T
interval or test for example you can test if the mean of the sample of differences is zero against greater than
zero and then now for a two sample T interval or test the standard deviation of x
minus y is the square root of the standard deviation of x squared plus the standard deviation of Y squared
so here is the confidence interval for difference of two means
and then make sure you check the required conditions for both samples and that they're independent
the degrees of freedom are quite complicated to determine and usually left to a graphing calculator to
determine but it's always between the smaller of N1 and N2 minus 1 and N1 plus
N2 minus two as a conservative estimate you can use the lower bound which is
smaller of N1 and N2 minus one for the degrees of freedom because that will
always produce wider confidence intervals than the actual degrees of freedom so here's an example problem we have the
same problem but now we have a different high school with 1 500 students and then we asked the same question to 50
students and get a mean of 100 minutes and a standard deviation of 40 minutes constructed and interpret a 99
confidence interval for the true difference in average times that the students at the high schools exercise
each week you may use a conservative estimate for DF all right so all of the required
conditions are satisfied and then now we construct the confidence interval so our Point estimate is 140
minus 100 which is 40 and then to determine the critical T value the area
is 0.005 since we want a 99 confidence level and the degrees of freedom since
30 is smaller than 50 the degrees of freedom are estimated by using 30 minus
1 which is 29. all right and then after that in the end you're going to get
approximately 1.476 to 78.524
all right and then interpreting the confidence interval and here are the steps for hypothesis
testing for difference in two means so now we're asking is there statistically convincing evidence at the level of
alpha equals 0.01 that the average time that students at the first school exercise is not the same as the average
time that schools at the second School exercise all right so we set up our test tens is asking whether they're same or
different the null hypothesis is not the difference is zero and the alternative it's that it's not equal to zero
all right and then we obtain a t statistic which is our Point estimate of 40 minus the hypothesized difference of
zero divided by the standard error all right and then now we obtain a
p-value and then in the end we obtain the results since the p-value of 0.008 is
less than the level of 0.01 we reject the null hypothesis
so now we're moving on to section 3 which is on linear regression and this section does not involve confidence
intervals and hypothesis testing but in the next section we will apply those two concepts to linear regression
so let's start off with describing bivariate relationships they can be described as positive or negative as one
increases what happens to the other linear or non-linear is it straight and strong or weak are the points packed
closely together or not so for example the first one is positive because as one
increases the other also increases linear because the relation is pretty straight and strong because the points
are pretty close to the straight line of best fit if you want pause the video here and try to describe the other four
relationships all right here's the answer the second one is negative linear and
weak the third one is negative linear and strong
the fourth one is positive non-linear because the straight line does not best fit the points and rather the quadratic
relationship does better and strong because the points are pretty close to the quadratic relationship
and this last one is positive linear and still strong even though there are some
points that do not fit in for the most part the points are pretty close to the straight line
the next topic is residuals so suppose you have a number of points and you fit
the line Y hat which is a symbol for predicted y value equals x to the set of
data the residual of each point is its y-coordinate minus y hat so for example
for the point four comma 3 the residual is negative one and you can determine the residual for
every point and you can square all of them to make all your values positive
and a residual plot shows the x coordinate of each point against its residual
and obviously a high residual squared is not good because it means that your fitted line is less accurate so what
least squares linear regression aims to do is find the line that minimizes the
sum of squares of residuals so the line is y hat equals a plus BX
and here are the formulas for A and B you do not have to remember them you're not even going to be tested on them at
all on the AP exam so the correlation coefficient R
represents how strong the linear relationship is and you do not have to remember the formula for this but you do
need to remember how to interpret it so R is always between negative one and one if it's negative one it means that
there's perfect negative correlation meaning that all the points fall perfectly on a line with negative slope
and then R equals one means perfect positive correlation R equals zero means that there's no
correlation at all and you can also have anything else in between negative one and one
the coefficient of determination which happens to be equal to r squared is the
total sum of squares minus residual sum of squares over total sum of squares so
the total sum of squares is the sum of squares of residuals when you do regression using only an intercept and
the residual sum of squares is the sum of squares of residuals using normal linear regression and the percentage
reduction is the coefficient of determination so for example if the total sum of
squares is 40 and the residual sum of squares is 32 then the percentage reduction is 20 and r squared is equal
to 0.2 the interpretation is that 20 of the variability in the dependent
variable is accounted for Bible regression model on the independent variable the important thing to remember
on this slide is the interpretation because it could be tested on the AP exam
the root mean squared deviation is the square root of the sum of squares of residuals Over N minus two and this is
also known as standard deviation of residuals because it's the same as the standard deviation of the residuals when
the mean is zero which happens to be true of least squares linear regression the only difference is that the
denominator is n minus 2 instead of n minus 1 because you're trying to estimate two variables with linear
regression and the root mean score deviation just gives us an idea of how large or small
the residuals are now let's suppose you have non-linear data how could you do regression on this
so for example for this graph what can you do well instead of creating a linear model
for Y versus X you can try Y versus x squared or square root of Y versus X or
you can also experiment with other things such as Y versus e to the power of X Y versus 1 over X and then find the
best model in the end so for example if you have a table of
all the X and Y coordinates of each point then you can turn it into a table of x squared versus y or X versus y
squared and then do linear regression on these tables influential points are points that if
removed would change the linear model significantly there are generally two types of influential points outliers
which have very extreme residuals and high leverage points which have a very unusual x value in the outlier example
shown here removing the outlier would make R closer to one because it improves the correlation decrease the y-intercept
because that point shifts the line upward and decrease the root mean score deviation because that point has a very
high residual in the high leverage Point example shown here removing that point
would make R farther from one this seems counterintuitive but the point is already very close to the least scores
linear regression line so removing it would actually make the correlation worse increase the root mean Square
deviation because that point has a low residual and increase the slope because
that point drags the right side of the line downwards and in the last example which is both the not liar and high
leverage Point removing that point would make R closer to one decrease the root mean squared deviation significantly
increase the slope and significantly decrease the y-intercept if this slide seems confusing the most important thing
to remember are just the terms influential Point outlier and high leverage point
so here are some properties of the least scores regression line you generally don't have to remember these but looking
over them is still pretty useful now we're almost done with chapter two only one section left
so if you consider the slopes of all possible samples of a specified size from your population here are some
formulas for that sampling distribution you do not need to remember these formulas but you should remember the
interpretation of slope because it could be tested on the AP exam
here are the conditions for linear regression inference you can remember them with the acronym liner you
recognize two of these conditions already Independence and random sample and the rest of the conditions are
linearity which is the relationship between X and y's linear normality which
means for every x value the distribution of Y is approximately normal and equal
variance which is the variance of residual is the same for any value of x
so you can check these conditions using residual plots the linearity condition
is not satisfied if there is a clear pattern in your residual plot for
example in this residual plot it seems like it first decreases and then increases and indeed in the original
scatter plot the relationship is more quadratic than linear the equal variance condition is not
satisfied if the spread in y depends on the x value for example in this residual
plot you can see that for higher values of X the Y values are much more spread out
the normality condition is not satisfied if for a given x value the distribution
of Y isn't normal in this residual plot that is the case you can see for certain
X values the distribution of Y is skewed so here's the confidence interval for
slope of a regression model it is the point estimate which is the slope in the least squares linear regression model
plus or minus the margin of error which is the critical T value times the standard error and the critical T value
is determined with degrees of freedom equals n minus 2 not n minus 1.
so here's a sample problem an athletic trainer wanted to investigate the relationship between an athlete's
resting heart rate and the heart rate after exercise from a sample of 12 athletes the athletic trainer recorded
each athlete's resting heart rate in beats per minute and heart rate in beats per minute after 5 minutes of moderate
exercise the results of a regression analysis of the data are shown in the following computer output assume the
conditions for inference are met which of the following represents a 95 confidence interval for the slope of the
population regression line so as you can see you don't have to determine the interval by yourself you will generally
be given computer output since the formulas are pretty complicated
so here's how you use the computer output the predictors on the left side fill the y-intercept and the explanatory
variable make sure you do not use the y-intercept and here are the coefficients for the
least scores regression line here is the standard error
and although it's not necessary in this problem this is the coefficient of determination
and now we can solve the problem we know two of the three values that we need to know and for the critical T value since
we want a confidence level of 95 percent and we have a sample size of 12 we can
use inverse t with area 0.025 and degrees of freedom equals 10 and in the
end you'll get this answer which is answer Choice e so we can also do hypothesis testing for
the slope of a regression model and it's common to test beta equals zero against
beta is not equal to zero because beta equals zero means that the two variables are independent the predicted y value
does not depend on the x value so here's a sample problem it's the same
as what we just saw but now it's asking assuming the conditions for inference are met is they're statistically
convincing evidence at the level of alpha equals 0.05 that heart rate after exercise is dependent on resting heart
rate all right so that's where these last two columns come in these two columns test
beta equals zero against beta is not equal to zero the T statistic which you can determine
by yourself is conveniently given and so is the p-value
and since the p-value of 0.03 is lower than the level of 0.05 we reject the
null hypothesis we have statistically convincing evidence that heart rate after exercise is dependent on resting
heart rate now I'll show you how to do statistical inference on the graphing calculator
so hit stat and go to test so this is the one sample t-test
this is the two sample t-test and I'll go in and show you how to do it you can enter either raw data or
statistics so the main standard deviation and size of each sample but
let's go back to data I'll show you how to do this so you need two lists of numbers and to
edit each list hit stat and then edit and as you can see I already entered
some values for two lists so let's go back if you want to change one of these lists
then hit second list and then choose the name of the correct list
and then frequency you can just leave those as one and this is the alternative hypothesis
and then pulled if you select yes for this then it will combine your two samples into one in order to find a
common standard error to use for the test so this will be done when you're
assuming that the two samples come from populations with the same standard deviation but since this is usually not
the case usually you leave this as null all right and then if you press
calculate it Returns the results including our p-value
all right now I'll show you how to do a one proportion Z test so here's the null proportion
and then this is the number of successes in our sample it's not P hat but rather
the number of successes in the sample and then this is the sample size
all right and then you can press calculate and it Returns the results including our p-value
all right and then you can also create confidence intervals for example this is the one sample T interval
this is a two sample T interval and one proportion and two proportion C intervals
and to do linear regression test or intervals those are right here so let's go in I'll show you how to do a
linear regression T interval all right so you need two lists of numbers
and then you can just leave all of these as is you don't have to enter anything for reg EQ
and then if you press enter it will return the results of the interval
now we're done with chapter 2 and into the review pause whenever you need so
first here are the steps for confidence intervals here are the steps for hypothesis
testing these are some useful formulas to remember if you can't remember all of
them most of them will show up on the formula sheet but still it's best to remember as
many as you can these are the areas you can make in hypothesis testing
here are the most important Concepts from section 3 which was on linear regression
these are the conditions for linear regression inference as well as how you can check them
and finally here is using computer output [Music]
so we're all done with the hardest chapter of this video and we're into our last chapter this chapter is on
probability so here's the overview
and here is the AP Statistics framework section one will be on probability rules
so here are some important notations a intersection B which is symbolized by a
symbol that looks like an upside down u means A and B P of a intersection b
means the probability that both events will occur a union B which is symbolized by the
upside down of the intersection symbol means A or B the probability of a union
b means the probability that at least one of A and B will occur
and a straight line downward means given and probability of a given b means the
probability of a occurring given that B has occurred an a prime or a superscript c means the
complement of a and they represent a not occurring so the probability of that is
one minus the probability that a will occur events are independent if and only if
knowing information about one event doesn't tell you anything about the other events so if you have two events
it means that the probability of a equals the probability of a given B because knowing that b occurred doesn't
tell you anything about a and similarly probability of b equals probability of B
given a so here's the product rule the probability of A and B is equal to the
probability of a times the probability of B given a and you can also rearrange this to get
that the probability of a given b equals the probability of A and B divided by
the probability of B and this shows up on the equation sheet all right and if a and b are independent
then the probability of B given a can just become probability of B and you can simplify the product rule a little bit
so events are mutually exclusive if and only if at most one of them can occur so
if you have two events this means that the probability of a and b equals zero because you can't have both of them
occurring you can have at most one of them occurring and similarly probability of a given B and B given a are both
equal to zero so here's the sum rule the probability of a or b equals the probability of a
plus the probability of B minus the probability of A and B and the reason
you have to subtract probability of A and B is because it is counted twice but
if a and b are mutually exclusive then probability of A and B becomes zero so
probability of A or B just becomes probability of a plus probability of B
here are some optional formulas if a b c and so on are all mutually exclusive
then probability of a or b or c or so on is equal to the probability of a plus
probability of B plus probability of c and so on you can get this from the sum formula that you saw on the last slide
and if exactly one of BC and so on will occur then probability of a is equal to
the probability of A and B plus the probability of a and c and so on which is also equal to the probability of B
times the probability of a given B plus probability of c times the probability of a given C and so on
so you can remember the important Concepts from this section with the acronym Sigma comp
here are some optional practice problems the third one is somewhat difficult because it requires using one of the
optional formulas if you can't solve it don't worry it's just meant to show an example of how that formula can be used
but anyway pause the video here if you'd like to give these problems a try
all right here comes the solutions so for the first one the coin flip and dice roll are independent so the
probability of both events is just the product of the two individual probabilities which is 1 12. and for the
second one a student can't be both the freshman and sophomore so the events are mutually exclusive and then using the
sum rule you get that the probability of freshman or sophomore is equal to 0.3
plus 0.2 minus 0 which is 0.5 and for the third one you know that it will
either rain or not rain tomorrow so you can use the optional formula and the
probability that it doesn't rain is equal to one minus the probability that it will rain and then after that you can
plug in all the values and you'll get that The Final Answer is 0.45
all right so now we're moving into section two which is on tables
so tables can be either one way or two way which you'll see on the next line and they can show either frequency or
relative frequency frequency just shows the total number of times each observation occurs for example it can
show the total number of freshmen in a school or it can show the total number
of students in a high school which is just the total number of freshmen plus number of sophomores plus Juniors plus
seniors all right and then relative frequency just shows the proportion so for example
the relative frequency of freshman is the number of freshmen divided by the overall total
all right and you can also make tables too late so if you want to show two categorical variables for example you
want to add male or female then you can use a two-way table and once again it
can show either frequency or relative frequency so conditional marginal and Joint
probability are just some terms for a given B A and A and B respectively so as
an example of conditional probability the probability of female given freshmen
is the number of people who are both fresh men and female divided by the
number of Freshmen the probability of sophomore given male is the number of people who are both
sophomore and male divided by the number of people who are male and then for marginal probability you
just take a row or column total and divide it by the overall total so for
example the probability of sophomore is the number of people who are sophomore
divide by the total number of people and for joint it's just an individual
cell entry divided by the overall total so for example the probability of Junior
and female is the number of people who are both Junior and female divided by the overall total
and then there are similar rules for determining the probabilities in a relative frequency table
and the only new thing on this slide is the probability of a or b and that can
be determined by taking the roll total plus the column total and subtracting
the cell entry this comes from the sum rule all right and you can also determine
whether two events are independent from a two-way table so for example are the
events being sophomore and being male independent well to determine that you can determine
whether probability of sophomore equals probability of sophomore given male or
probability of male equals probability of male given sophomore you only need to check one of these because if one of
these is true then the other is true as well all right so let's just check the first
one so the probability of sophomore is 450 divided by 1800 and the probability
of sophomore given male is 230 divided by 910 and these two are not equal so
the events are not independent another useful function of tables is
organizing data to solve word problems so here's a sample problem that I'll walk you through and then I'll show you
an optional practice problem so here we go a certain test for a disease has a false negative rate of 10
and a false positive rate of one percent in a certain population two percent of
individuals currently have the disease a if somebody in the population tests positive for the disease what is the
probability they actually have it B what is the probability of a test giving a correct result
so the false negative rate means that if you're actually positive then that is
the probability that you'll test negative and the false positive rate means that if you're actually negative
then that is the probability that you'll test positive so we're given the information that's
listed here and we can fill in a relative frequency table with that information so first the probability of
actually being positive is 0.02 and then since we know the overall total is one we can determine the column total
of actually negative and then we can figure out joint probabilities using the product rule
remember that the probability of A and B is equal to the probability of a times the probability of B given a
all right and then now since we know the column totals we can fill in these two cells I know the numbers are getting
kind of complicated but the important thing is that you understand the procedures and now we can fill in the rest of the
table so now we can solve part A which is asking for the probability of actually
being positive given that you test positive and that is the probability of being actually positive and testing
positive divided by the probability of testing positive and The Final Answer you'll get is 0.647
for Part B the probability of a test being correct is equal to the probability of actually being positive
and testing positive plus the probability of actually being negative and testing negative in the end you'll
get 0.9882 so here's the practice problem the
numbers here are a lot nicer and I'll tell you right now that you don't actually have to fill out the entire
table to solve the problem but anyway pause if you'd like to give this problem a try
all right here comes the solution so you're given the following information and you can use it to fill
out the relative frequency table so the probability of male is 0.5 the
probability of less than one year is 0.4 same with one to five years and then the joint probability of having
work for more than five years and being male is 0.5 times 0.4 which is 0.2
and now since the overall total is one you can further fill out these cells
and now you can solve that the probability of being female and having worked more than five years is zero and
because it's zero the events are mutually exclusive now we're all done with section two
section three will be on the final type of hypothesis test the reason why this was not part of chapter two is because
it's applied to tables which you haven't learned about until just now
so there are three types of chi-square tests but really only two because two of them are almost identical at the end of
this chapter I'll show you how you can do these tests on a graphing calculator so first off is the chi-square goodness
of fit or gof test which tests whether sample data for one categorical variable
came from a hypothesized distribution so here's an example problem Maria wants
to check whether a six-sided die is fair if he rolls that die 60 times and Records the results below do the data
provides statistically convincing evidence at the level of alpha equals 0.05 that the dye is not fair
the steps for chi-score tests are the same as for any other test but now it's just obtaining a chi-score statistic
rather than the Z or t statistic so here are the required conditions
random sample 10 Rule and a new condition called large expected counts
the reason is because the larger the expected counts are the more accurate the chi-score test is
so in AP Statistics generally how you check this is seeing whether all
expected counts are greater than or equal to five and the chi-square statistic is the sum
over all cell entries of The observed count minus the expected count squared divided by the expected count
so here is the chi-square distribution it is the distribution of all chi-score
statistics assuming that the null hypothesis is true so the most important characteristic is
that the degrees of freedom is the number of categories minus one and then some other characteristics are that it's
skewed right but less skewed for higher degrees of freedom the mean is equal to the degrees of freedom and the mode is
the degrees of freedom minus two and the p-value is determined by the area to the
right of the chi-square statistic so now let's go back to the problem
to solve this problem we first set up the hypotheses the null hypothesis is that the die is fair the alternative
hypothesis is that the die is not fair and if we assume that the dye is Fair since the die was rolled 60 times we
would expect to see each number 10 times so our expected count for each number rolled is 10.
now we check the conditions we do have a random sample we don't need to check the 10 percent rule because we're not
sampling without replacement and all the expected counts are 10 which is at least five and keep in mind that the observed
count of four that you saw doesn't matter because you only need to check the expected counts
all right now we obtain the chi-square statistic so we sum over the numbers one
through six The observed count minus the expected count squared divided by the
expected count and you're going to get 13.4 in the end
all right now we obtain a p-value since there were six categories the degrees of freedom is 5 and the right tail of the
chi-square distribution has area 0.020 and since that's less than the level of
0.05 we reject the null hypothesis we have statistically convincing evidence
that the die is not fair all right so now we're moving on to the
chi-square test of Independence which tests whether two categorical variables are independent so here's an example
problem you want to find whether there is an association between grade level and interest at your school you collect
the sample of 300 students and record the responses below do the data provide statistically convincing evidence at the
level of alpha equals 0.05 that there is an association between grade level and interest assume the random sample
condition and 10 percent rule are satisfied all right so the null hypothesis in a
test of Independence is always going to be that the two variables are independent and the alternative is that
they're associated the expected count for each cell assuming that the variables are
independent is the row total times the column total over the overall total you
can check that this satisfies the independence condition the degrees of freedom is the number of
rows minus one times the number of columns minus one the reason is because that is the number of cells that are
free to vary without changing any row or column total once you set that number of
cells the rest of the cells can be automatically determined so now let's go back to the problem
so the expected counts for each cell are shown right here for example the
expected count for freshman and stem is 135 times 75 divided by 300 which is
33.75 and all expected counts are at least five so the large expected counts
condition is satisfied all right so now you sum over all of the
cells The observed minus expected squared divided by the expected to get
the chi-score statistic and then use the chi-square distribution
with degrees of freedom equals six and you'll find that the area of the right tail is 0.275 so that's our
p-value and since it's greater than the level of 0.05 we fail to reject the null
hypothesis we do not have statistically convincing evidence that there is an association between grade level and
interest the final type of chi-score test is the chi-square test of homogeneity which
tests whether the distribution of a categorical variable is the same for different populations or treatments and
this is very similar to the test of Independence so an example of the two populations
could be two different schools and you want to see whether the interest distributions of those two schools are
the same or different so the way you go about solving this is almost the same as for a test of
Independence the only difference really is the hypotheses
all right so now let's find all of the expected counts
and all the expected counts are at least five so that condition is satisfied
and then now we obtain the chi-score statistic which will be 2.344
and the degrees of freedom is to and we can obtain that the p-value is
0.310 and since that is greater than the level of 0.05 we fail to reject the null
hypothesis we do not have statistically convincing evidence that the interest distributions are different for the two
schools all right so last section binomial and
geometric probability binomial probability involves repeating an action with the chance P of success n
times and determining the probability of exactly X successes so for example a
spinner has a 0.6 chance of landing on green if you spin the spinner five times what is the probability that it will
land on green exactly three times well let's start off with an easier problem suppose that now we have the order fixed
and the first three spins must land on green and the last two must not what is the probability of that happening
well because the spins are all independent the product rule tells us that we can just multiply the individual
probabilities together and we'll get 0.03456 however there are five choose three or
five factorial divided by three factorial divided by 2 factorial ways to choose which three spins land on green
so therefore we must multiply what we got by five choose three which is ten
and we'll get a final answer of 0.3456 so in general this is the binomial
probability formula and it will show up on the equation sheet all right and here's the binomial
distribution it has the number of successes on the x-axis and the
probability on the y-axis and this is an example of a discrete distribution
for any discrete distribution the mean is the sum over all X values of the x
value times the probability of that x value and the standard deviation is the square root of the sum over all X values
of the x value minus the mean squared times the probability of that x value
and for a binomial distribution the mean is NP and the standard deviation is
square root of n times P times 1 minus p and all of these equations will show up on the formula sheet
all right so now into geometric probability which involves repeating an action with the chance P of successes
and determining the probability of the first success being on the X trial so
for example a spinner has a 0.3 chance of landing on red if you continuously
spin the spinner what is the probability that the spinner lands on red for the first time on the third spin
well in that case we need the first two spins to not be read and the third spin to be red and then we can multiply the
individual's probabilities together and we'll get 0.147 so in general here is the geometric
probability formula it's 1 minus P to the power of x minus 1 where X is the
trial that you want your first success to be on times p
all right and here is what the geometric distribution looks like and keep in mind that it keeps going all the way to
Infinity for the X values but we obviously can't show that many values here
all right and for geometric distribution its mean is one over p and the standard
deviation is the square root of 1 minus p over p last topic for chapter three using the
graphing calculator so first I'll show you how to do chi-square tests go to stat and keep in mind that on the
AP exam they won't often ask you to do an entire chi-score test because they try not to make the questions too time
consuming but it's still good to know how to do one on your graphing calculator so here the chi-square goodness of fit
test so you enter two lists one for the observed counts and one for the expected as well as the degrees of freedom number
of categories minus one and then if you select calculate then it will return the p-value for you
all right and then the chi-square test of Independence or homogeneity is right
here you need to enter two matrices to edit the matrices go to second Matrix
and then edit and it will allow you to change the dimensions as well as the entries so I
already entered some stuff so let's go back all right if you select calculate
then it will return the p-value for you so the perk of this is that you won't
have to calculate the chi-score statistic anymore and you can also use a graphing calculator to help you with binomial or
geometric probability so go to Second distribution and then the first important function is
binomial PDF so you have to enter the trials
the probability of success and the x value and this will return the
probability of exactly X successes and then now if we go back
and use binomial CDF so enter the number of Trials
probability of success and x value then it will return the probability of less
than or equal to X successes and then similarly or geometric PDF and CDF
you enter the probability of success and the x value PDF will give you the probability of the
first success being on the x file and then for geometric CDF
it will return the probability that the first success will be on the X
trial or any trial before that and now we're all done with chapter
three so here's the review first here is the sigma comp acronym for the first
section here are some important facts about tables
here are the steps and specifics for performing chi-square tests
and finally here are the concepts for binomial and geometric probability
[Music] so I hope this video was really helpful if you have questions always feel free
to ask Down Below in the comments if you enjoyed this video check out the rest of my channel where I have a lot of other
useful courses as well thanks for watching I hope to see you next time