RS

AP STATS

whether you're looking to learn AP Statistics from start to finish or you just want to review you've come to the

right place this video will cover all the content in AP Statistics now it may

seem impossible to learn an entire course in one video but it's actually not that difficult AP Statistics doesn't

involve a lot of memory and a lot of the concepts are very similar so just remembering one of them will make it a

lot easier to remember all the rest if you're just looking for one specific topic here is the video framework to

help you find it let's Dive Right into chapter number one

at the start of each chapter I will show you an overview that lets you know what each section covers and which sections

are the most important and why and I will also show you the equivalent units and AP stats and which sections

cover them as well as the approximate weights of those units on the AP exam

and Regis means that the section is not part of the current chapter knowing the overview and the framework

will help you know which topics are the most important as you learn AP Statistics in one video

if you would like to take a look at the overview pause right here but anyway we're going to jump right

into chapter 1 section one statistical studies let's first Define statistics statistics

1.1: Statistical Studies

deals with collecting and organizing data and using data from a sample to make generalizations to a population I

haven't defined sample and population for you yet but let's suppose that you wanted to answer a question involving

all people of an entire country it would be almost impossible to collect data from everyone because that would be so

many people so what you can instead do is collect data from a number of people in that country and then use that data

to make generalizations to all people of the entire country that's basically what

statistics is and a statistical study is the process of collecting data and then making

inferences in order to answer a question

so here are some key terms that will show up on the AP exam and also later in

this video a population is a large group of people or other things and this is

generally what you're trying to answer a question about a sample is a subset of a population

that is studied in a statistical study a statistical unit is a member of the

sample a population parameter is a number describing the population and keep in

mind that this has to describe a population if you want something to describe a sample that would be called a

descriptive statistic we're going to dive more into that in section three

and a subject is just a human unit so that's all the content from this

section that is needed for future sections the rest of the content in this section is still important so you should

still watch it however I would suggest not rewinding and re-watching too many times

the next topic is the types of statistical studies in an observational study the person conducting the study

observes characteristics or behaviors without trying to affect them or interfering in what they're studying so

a survey would be an example of an observational study because let's

suppose that you ask people how much water they drink yesterday you're just asking them and you're not in any way

affecting how much water that they drink yesterday an experiment is different because the

person conducting the study takes a role in what's happening in the study so they do that by assigning units to different

treatments and observing the effects of each treatment an example of a survey is

randomly telling people whether to drink lots of or little water and then observing their running speed in a race

and the key word is randomly you must assign people to treatments randomly

because if you told the people who are already fast to drink lots of water and

the slower people to drink little water and your results show that the people drink lots of water have higher running

speeds you can't infer whether drinking lots of or little water makes any difference to running speed because you

don't know if the faster speeds were due to those people drinking lots of water or because they were already faster

now we're getting into the types of variables you've seen a lot of this in previous math classes the explanatory or

independent variable is the variable that is adjusted an example would be how much water the racers drink this

variable is generally plotted on the x-axis the response or dependent variable is

the variable that is measured an example would be the running speed this is generally plotted on the y-axis

a confounding or lurking variable is a variable that is unaccounted for and

affects both the explanatory and response variables an example would be how much food is eaten because people

who eat more food will probably drink more water to go along with it and how much food you eat might affect your

running speed it's important to account for confounding variables and experiments because your goal in experiments like

these is probably to measure whether or not the response variable changes as the explanatory variable is adjusted and if

you find that there is a change but you haven't accounted for confounding variables then you don't know if those

changes are because of adjusting the explanatory variable itself or because there's a confounding variable that's

affecting both the explanatory and response variables so that brings us into the next topic

control which is accounting for confounding variables one common method of control is to have

a control group which is the group that receives no special treatments so let's suppose that you're testing whether or

not a certain medication has an effect you can randomly assign people to either

the control group which doesn't take the medicine and the treatment Group which does and you can measure whether or not

there tends to be a difference between the control group and the treatment group and with that you can determine

whether or not the medicine causes better results a control variable is a variable that is

held constant in order to prevent it from becoming a confounding variable having control variables can be

restrictive because all of your units must satisfy these control variables and

if they can't then they can't be part of your study so one alternative to having control

variables is to make them grouping variables and that gets us into the next topic of experimental designs the most

basic design is a completely randomized design in which units are assigned to treatments at random and not grouped

based on any characteristics in a randomized block design units with

similar characteristics are grouped into blocks and within each block units are assigned to treatments at random so

let's suppose that you have a study and you think that it's possible that gender

would be a confounding variable because males and females might respond differently to a certain treatment so if

you don't want to make gender a control variable because that would force all your units to be the same gender what

you can instead do is make gender a grouping variable and then within each

block you would determine whether the treatment group has a significantly different result from the control group

to reach your final results a matched pair's design is a design in

which units with similar characteristics are paired and one unit in each pair is assigned to a treatment and the other is

assigned to the control group an example of a pairing variable is age so we can

suppose that somebody who's 25 years old is paired with somebody who's 26 years

old and a difference of one year is hardly anything so essentially for this

pair the age is effectively constant and within each pair you determine whether

or not there was a significantly different result for the unit that went to the treatment group and the unit that

went to the control group and by determining this for all pairs you can determine whether or not the treatment

has a significant effect the next topic is blinding a

double-blind experiment does not let the subjects know their treatment group and does not let The Observers know which

treatment group the subjects are in not letting the subjects know their treatment group is useful because

otherwise the subject's expectations can influence the results for example

suppose a certain pill is falsely believed to improve running speed and races in that case if the people knew

whether or not they were given the treatment then the people who were given the treatment might falsely believe that

they have a better chance of doing well in the race and therefore have motivation and try harder in the race

and therefore finish faster but that was because of the subject's

expectations and not because of the treatment itself so not letting the subjects know their treatment group

prevents expectations from influencing the results not letting The Observers know what

treatment group the subjects are in that could prevent the observer's expectations from influencing the

results for example if The Observers knew which subjects were in the treatment group and which were in the

control group then they might have higher expectations for people in the treatment group to be able to finish

faster in a race and therefore cheer for them more which could lead to them finishing faster but that was because of

the the observer's expectations instead of the treatment itself a single blind experiment is one of the

above but not both and the placebo is a pill or medicine that has no effect the experimenters

often give this to the control group and don't let them know that the medicine has no effect this ensures that the

subjects do not know their treatment group and a placebo can lead to something called the placebo effect where the

placebo which is supposed to be inactive actually causes an effect because of the

subject's expectations so now we're getting into correlation versus causation and this ties back to

the types of variables so correlation is a positive or negative Trend between two

variables for example shark attacks tend to increase as ice cream sales increase this is positive correlation because as

one increases the other increases if as one increases the other decreases then

this would be called negative correlation and causation means that adjusting one

variable causes changes in the other variable for example taking a medicine

speeds up recovery from a disease being able to distinguish between correlation

and causation is important because then you can reach the results of whether adjusting one variable would affect the

other so going back to the first example we can ask if we put a limit on how much

ice cream stores can sell would that reduce the number of shark attacks the answer is no because ice cream sales

increasing does not cause shark attacks to increase there is a confounding

variable and the confounding variable in this case is weather if it gets hotter

then people are more likely to go to the beach which would increase the number of shark attacks and if it's hotter people

would also be more likely to buy ice cream foreign so experiments allow you to

infer causation because they generally account for confounding variables and observational studies do not allow you

to infer causation however in some cases experiments are

not ethical for example many years ago there was a cholera

outbreak cholera is a disease and some people hypothesize that cholera is

spread through contaminated water in this case it will not be ethical to

conduct an experiment to prove or disprove that because then you would have to give some people contaminated

water and that's not ethical here are some miscellaneous terms replication is

being able to repeatedly receive consistent results in order to ensure the results are correct so in general

you want to ensure that your sample size is sufficiently large so that your results are consistent and not just

highly influenced by chance and this is consistent with the law of large numbers which states that the more

you repeat something the closer experimental results will be to expected results

a census is observing or serving an entire population and this means that

you do not need to do statistical inference because you've already answered the question about the population

so I know that section was pretty long but the rest of the sections in this chapter are pretty short this next

1.2: Sampling & Bias

section will be on sampling and bias there are five sampling methods simple

random sampling systematic sampling stratified random sampling cluster sampling and convenience sampling

simple random sampling is selecting a completely random sample from the population simple random sampling has

the properties that every possible sample from the population is equally likely to be selected and every unit in

the population is equally likely to be sampled the pros of this are that it's

unbiased and most ideal for statistical inference because as you'll learn in statistical inference you consider all

possible samples from the population but the cons are that this requires the list of all population members because every

unit has a chance of being sampled so you must account for each and every unit and then an example of this would be

choosing nine manufactured items at random this is feasible because after an

item is manufactured it's in your hands so essentially you have the list of all population members and can choose any of

them systematic sampling is starting at a random point and then sampling every nth

unit after that starting point where you decide what n is the pros of this are that it's easier than simple random

sampling and it's usually a pretty good approximation of a simple random sample the con is if patterns exist for example

if every fourth unit in line shared a common characteristic then the results can be biased

stratified random sampling is dividing the population into several groups or

strata based on common characteristics and then sampling randomly from each group the pros of this are that IT

addresses all groups and it's unbiased because you sample randomly from each group The cons are that it requires a

list of all population members because every unit of every group has a chance

of being sampled cluster sampling is dividing all units into clusters and then choosing one or

more clusters to sample all units from the pros are that this is usually simple and can be unbiased especially if the

Clusters are diverse the con is that it's biased if the cluster is chosen don't represent the population one

common factor that makes cluster sampling easy is geographic location because it's easier to sample all 1000

people from one city instead of choosing a thousand cities to sample one person from each

convenient sampling is sampling the units that are the most convenient to sample for example your friends or

people whose phone numbers that you have this is the simplest method and is suitable for simple and small studies

but the con is that this is usually the most biased method so usually you should not use this method

you don't need to know about the pros and cons for the AP exam you just need to be able to identify whenever each

method is being used now we are getting into the types of bias sampling bias is when not all

members of the population are equally likely to be sampled this could be a problem even in effective sampling

strategies such as stratified random sampling because if you sample the same

number of people from each strata but the strata are not of same sizes then people from smaller strata are more

likely to be sampled so you need to make sure that the number of people that you sample from each strata is proportional

to the size of that strata under coverage bias is when not all groups in the population are covered by

the sample this is generally not a problem in stratified random sampling for simple random sampling this is

generally not a problem for large sample sizes because large sample sizes are more likely to cover all the groups

response bias is when the responses to a survey are not accurate

and non-response bias is bias caused by people not responding to a survey voluntary response bias is caused by

people not responding to a voluntary survey when you send out a voluntary survey one thing that you might observe

is that the people who respond to it are generally the ones with the strongest opinions on the issue the reason is

because the other people might not think it's worth their time to fill out the survey so this is one thing that you

need to consider when sending out a voluntary survey now we are getting into section three

1.3: Descriptive Statistics & Data Visualizations

this section will mostly be reviewed but there will still be some new Concepts let's start off by talking about

histograms on the horizontal axis there are several ranges in this case ec021

one to two two to three and so on several different things can go on the vertical axis in this case you see

frequency that means that 14 car trips in the last month were between zero and one hours five were between one and two

and so on you can also have relative frequency which is number of observations in the

specified range divided by the total number of observations in this case there are 25 total observations 14 of

which were between 0 and 1 hours so dividing 14 by 25 gives 0.56 as a

relative frequency and you can do this for the rest of the ranges now this is something that you haven't

learned about in previous math classes density histograms the density is relative frequency divided by the bin

width or the size of the range is in this case the density histogram looks the same as the relative frequency

histogram that you saw in the previous slide because the bin widths are all one

however if we change the bin width to 0.5 you see that the density histogram

will look different and one property that density histograms all have is that the total area of the

rectangles is always one a probability density function is a

smoothed out density histogram and this also always has the property that the area under the curve is one you can make

a probability density function using a histogram of any bandwidth size but in general it shouldn't be too small or too

large if it's too large then it won't be very accurate and if it's too small there will be a lot of spikes unless you

have a very high number of observations a Dot Plot has numbers along the

horizontal axis and above each number there is a certain number of dots that represents how many times that

observation shows up so far we've been dealing with data visualizations for numerical data where

all the observations are numbers if we want to deal with categorical data or all the observations are different

categories one data visualization that we can use is bar charts where the

horizontal axis represents all the categories and then on the vertical axis you can have frequency or relative

frequency now if you had two categorical variables then you can represent the second one

using colors on the bar chart you haven't seen Mosaic plots in

previous math classes the way this works is the width of each bar is proportional

to the frequency of that category so you can see from the bar chart that approximately twice as many people pick

blue as red therefore the width of blue should be approximately twice the width

of red and then the total height of each bar will always reach one on the

vertical axis and the vertical axis is relative frequency you can interpret a

relative frequency as of the people who picked red approximately 75 percent or

male of the people who picked yellow approximately 50 were male and so on if

you're confused with Mosaic plots don't worry because they don't show up often on the AP exam and when they do they

usually explain it pretty well a stem and leaf plot has stems on the left side and leaves on the right each Leaf

represents the ones digit of one individual observation and the stems are the digits in front of the ones place

a scatter plot is a way of visualizing data for two numerical variables usually

the explanatory variable is on the x-axis and the response is on the Y and each point in the scatter plot

represents one observation a pie chart is a way of visualizing data

for categorical variables where the angle of each sector is proportional to

the frequency and usually pie charts aren't used because humans aren't that good at

seeing angles so usually they're substituted for relative frequency bar charts just make sure that the vertical

axis starts at zero now we're getting into descriptive statistics which summarize the results

of a study and I've divided them into two groups here and you'll see why soon

you've learned about the descriptive statistics in the first group in previous math classes they mostly deal

with percentiles the only new thing is that nap stats outliers are not considered when determining the minimum

and maximum so an outlier is a value that differs significantly from most other

observations there are many ways to decide what constitutes an outlier but generally a value that is less than the

lower quartile minus 1.5 times the interquartile range or greater than the

upper quartile plus 1.5 times the interquartile range is considered an

outlier so you've seen box spots in previous math classes and this diagram shows where the outliers lie notice how

the minimum and the maximum both do not line the outlier section if you have outliers they would be their

own individual points outside of the box plot this is how you'll see box plots in AP stats and this is different from

previous math classes now into group 2 descriptive statistics the mean is computed by adding up all

the observed values and dividing by the number of observed values the variance is 1 divided by n minus 1 times the sum

of the squares of the differences between the observed values and the mean a high variance means that the observed

values tend to fall far away from the mean and the standard deviation is the

square root of the variance the range is the max minus the Min and this time it includes outliers

a descriptive statistic describes a sample while a population parameter describes a population you should

remember the variables in this table because we're going to be using them

what's the best way to describe a population the answer is you should use Group 1 population parameters when the

population is skewed because they're robust or less affected by outliers since population parameters should

represent the overall population they shouldn't be influenced by extreme individual values

this diagram right here shows what would happen in a right skewed population the

mode or the most frequently occurring value would be lower than the median and the mean because the highest point in

the distribution is on the left the median would be next and then the mean

would be the highest one because it is influenced by those extremely high values

and then use group two population parameters when the population is approximately symmetrical in this case

the mean and the median are pretty much the same now we're getting into the final section

1.4: Distributions

of this chapter let's first talk about discrete and continuous variables discrete variables

only take on countable values examples would be the number of flowers in the backyard or the number of stars in the

Milky Way you can't have half of a flower or a star for example you can only have 0 1 2

3 4 and so on the distribution that you see here is an example of a binomial distribution which

we'll learn more about in chapter 3. continuous variables can take on any

value within a certain range an example would be the liters of gas in a tank The

Continuous distribution that you see here is an example of a sampling distribution and this is useful because

it allows us to compare a sample that we clicked with all possible samples from the population and that allows us to do

statistical inference which is what we'll be learning about in chapter 2.

distribution shapes are described by the skewness and the number of Peaks a

uniform distribution is roughly horizontal so for every x value the Y

value is approximately the same the distribution that's skewed left or right would have a longer left and right tail

respectively and a distribution can also be symmetrical

to describe the number of Peaks you can use the terms unimodal bimodal bimodal

and so on a normal distribution is bell curve shaped so it is unit modal and

symmetrical the empirical rule is not something that you have to remember but it is pretty

useful it tells you approximately what percentage of values fall within a

certain number of standard deviations of the mean in a normal distribution

the Z statistic tells you how many standard deviations away a value is from the mean and it is negative if the value

is lower than the mean now we are getting into the most important part of this chapter using the

graphing calculator if you don't have a graphing calculator in the description there will be a link to a video that

will show you how you can access an online one which is what I'll be using so hit second distribution

the first important function that you'll need to know about is normalcdf or cumulative density function this will

give you the area under the curve of the normal distribution which is very useful if you want to determine What proportion

of values fall within a certain range so let's suppose we wanted to know what

proportion of values fall within one standard deviation of the mean that corresponds to a z statistic of between

negative one and one so for lower type negative one and for

upper type 1 and it gives approximately 0.68 and

you'll notice that this is consistent with the empirical rule we can also change the mean and standard

deviation so let's change the mean to 25 and the standard deviation to 5

now suppose we want to know what proportion of values fall within two standard deviations of the mean so for

lower we would type 15 which is two standard deviations below the mean

and for upper we would type 35 which is two standard deviations above the mean

and it returns approximately 0.95 which is also consistent with the empirical

rule now suppose that we did not want two tails so we just want to know what

proportional values fall below 35 and we don't want a lower Bound in that case we

would type in a really low number for the lower bound say negative 10 to the power of 99

and then it would give approximately 0.977

and if you wanted a right tail only so you only wanted to know the proportion of values that fall above a certain

value then you would type a really high number say 10 to the power of 99 for

upper now the second important function is inverse Norm this function takes an area

as input and Returns the proper tail or tails such that the area above below or

between the Tails is equal to the area that you input

so let's suppose that we want to find the median in that case

the area below the left tail should be 0.5 and if we just set the mean to zero and

the standard deviation to one we'll find that the result is zero so in

the normal distribution the median is always equal to the mean now let's suppose instead we wanted the

25th percentile then we would type 0.25

and then the result would be approximately negative 0.67

and then if we instead wanted the area between two tails

so let's suppose we wanted the area between the two tails to be 0.997

we we get approximately negative three and three and this is consistent with

the empirical rule we're into our last topic for this chapter describing distributions on frq

questions if you're asked to describe or compare distributions for full credit you need to describe the shape Center

and variability you also need to provide context and if asked make comparisons

so for the shape for both regions A and B the distribution of lead

concentrations is skewed right for Center you need to find the median or the mean median is easier for each and a

you can find that there are 50 samples in total so you need to find where the 25th 26 from the left lies and you can

see that there are 20 samples between 0 and 50 and the 25th 26 is between 50 and

100. so the median lead concentration for region a is between 50 and 100 Which

is less than the median lead concentration for region B which is between 100 and 150. for variability you

need to find the IQR or the range range is easier region a the range is between

450 minus 50 and 500 minus zero so the

range of lead concentrations is between 400 and 500 which is greater than the

range of lead concentrations for region B which is between 250 and 350.

so I satisfied the condition of providing context because I use the term lead concentrations and I also made

comparisons by using greater than or less than so here is model solution and scoring

Chapter 1 Review

now we're done with chapter one and into our review section one may have felt difficult because of the amount of

vocabulary terms but it's oftentimes easier if you can see all the vocab terms on one page which is listed right

here pause here if you need to look over the list of vocab terms

and then for section two here are all the sampling methods

and then here are the diagrams for the types of bias and for section three here

are all the data visualizations with the important ones in bold

here are examples of the two new data visualizations you learned today and below that are some descriptive

statistics and for section four you need to know the difference between discrete and

continuous and some important skills listed below

foreign to chapter 2 which is probably the most

Chapter 2 (Statistical Inference) Intro

fun chapter of this video here is an overview pause if you like to look over it

and here is the AP stats framework chapter 2 Section 1 is probably going to

2.1: Sample Proportion Inference

be the most difficult section of this video because you'll learn about two new major Concepts confidence intervals and

hypothesis testing but don't worry because future sections will also use these two concepts giving you an

opportunity to practice with them and when we do the review of chapter two it

will help you remember them even better so here's an example of a confidence

interval problem a principal wants to know what proportion of the 2000 students at his high school exercise at

least 150 minutes each week he randomly selects 100 students and asks them that question the proportion of students

answering yes is 0.7 construct and interpret a 95 confidence interval for

the true proportion of students at the high school that exercise at least 150 minutes each week

to build confidence intervals we'll be using the sampling distribution which is the distribution of all possible samples

of a specified size from the population some important properties of this

distribution are that the mean is equal to the true population parameter of

proportion and it's approximately normal when the number of successes and failures are

both at least 10. a 95 confidence interval means that we

want 95 of all samples to generate a confidence interval that contains the

true population parameter and we know from the empirical rule that in a normal distribution 95 of all

samples fall within 1.96 standard deviations of the mean

so this gives us the procedure to make confidence intervals the confidence interval is the point estimate plus or

minus the margin of error and the margin of error is the critical value which is 1.96 for a 95 confidence interval times

the standard deviation of the sampling distribution which can be given by the formula square root of P times 1 minus P

all over n now we don't know what p is that's the true population parameter but we can

estimate it by using p hat which is what proportion we get for our sample

so this is the formula for the confidence interval so require conditions for constructing

the confidence interval are that the sample is random or the data is from a randomized experiment

the sampling distribution is normal and when you're sampling without replacement which means that after

somebody is sampled they can't be sampled again there's also the 10 percent rule which is that no more than

10 percent of the population is sampled the reason is because the sampling distribution is actually the sampling

distribution for all samples with replacement and when you're sampling without replacement the standard

deviation is actually less than the square root of P times 1 minus P Over N but if no more than 10 percent of the

population is sampled the difference is negligible so the 10 rule ensures that

the standard deviation is approximately accurate interpreting the confidence interval so

one way to interpret it is we are C percent which is usually 95 percent confident that the true population

proportion is between the lower bound and the upper bound and another correct way is about C percent of all samples of

the specified size from the population will produce a confidence interval containing the true population

proportion both of these interpretations are correct but the first one is more common because it actually reports the

bounds of our confidence interval here are some examples of incorrect interpretations and the red text shows

where it is incorrect an additional note is that for any

sample size the maximum possible standard deviation of the sampling distribution is square root of 0.5 Over

N which occurs when P equals 0.5 so some questions on the AP exam may ask you

what's the maximum possible standard deviation of the sampling distribution so this is where this fat comes in handy

so as you can see confidence intervals aren't that difficult there are really only three steps what might be difficult

is the specifics of each step fortunately many of the formulas that

you've seen show up on the formula sheet for the AP exam and you can use graphing calculators to construct confidence

intervals for you which I'll show you how to do later so now revisiting the problem our first

step is to check the required conditions so we do indeed have a random sample the

sampling distribution is approximately normal because 100 times 0.7 and 100

times 1 minus 0.7 are both at least 10 and the 10 percent rule is also

satisfied step two is to construct the confidence interval so our Point estimate was 0.7 and then

plus or minus 1.96 times the square root of 0.7 times 1 minus 0.7 over 100 in the

end we'll get approximately 0.610 to 0.790 and it's usually best to round to

three decimal places so to interpret the confidence interval the best interpretation is we are 95

confident that the true proportion of students at the high school that exercise at least 150 minutes each week

is between 0.610 and 0.790 so now we're into hypothesis testing

we'll always have a null and an alternative hypothesis the null hypothesis must contain the equal sign

and it's usually the equal sign itself the alternative hypothesis could be not

equal to less than or greater than so with hypothesis testing we collect

the sample and find the probability of seeing a sample at least as Extreme as our sample assuming that the null

hypothesis is true and if the probability is low it means that a result would be unusual if the null

hypothesis is true which means that the null hypothesis is probably false so we

reject the null hypothesis and accept the alternative hypothesis so there are four steps to hypothesis

testing the first step is to set up the test and check the required conditions in a word problem where the null and

alternative hypotheses aren't given you set them up yourself the required conditions are the same as

for confidence intervals the only difference is instead of P hat you use the P naught value that was given to you

in the null hypothesis since you're assuming that it's true the next step is to obtain a z statistic

and once again since you're assuming that the null hypothesis is true you use P naught to get the mean and standard

deviation of the sampling distribution and if you remember the Z statistic tells you how many standard deviations

from the mean a value is so this is the formula for the Z statistic

then we obtain a p-value using the Z distribution which has mean 0 and

standard deviation one if the alternative hypothesis is less than then the p-value is the area to the

left of your Z statistic if it's greater than then it's the area to the right

and that the alternative hypothesis is not equal to then the p-value is the area to the left of negative absolute

value Z Plus the area to the right of absolute value Z which is also twice the

area of the smaller Tail Of Z which in this case is the right tail because Z is positive

finally obtain the results the p-value means that if the null hypothesis is

true then the probability of obtaining the test statistic at least as Extreme as what was obtained is the p-value the

p-value is not the probability that the null hypothesis is true that's a common mistake that people make

if the p-value is less than or equal to the level of significance which is usually 0.05 reject the null hypothesis

we have statistically convincing evidence that the alternative hypothesis is true otherwise fail to reject the

null hypothesis we do not have statistically convincing evidence

so here are the four steps of hypothesis testing now here's an example problem this is

the same problem as earlier but now the proportion is 0.6 and it's asking us is

there statistically convincing evidence at the level of alpha equals 0.05 that

more than half of all students at the high school exercise at least 150 minutes each week

so the first step is to set up the test and check the required conditions the null hypothesis is p equals 0.5 since

it's asking whether more than half and the alternative hypothesis is greater since it's asking more than half

and the required conditions are all satisfying then we obtain a z statistic and

remember to use 0.5 and not 0.6 to determine the standard deviation of the

sampling distribution so Z equals two and then the p-value is the area to the

right of Z equals 2 of the normal distribution with mean 0 and standard

deviation one and you can determine the area of 0.023 and since this is less than the level of

0.05 we reject the null hypothesis we have statistically convincing evidence

that more than half of all students at the high school exercise at least 150 minutes each week

so there are two types of errors that you can make in hypothesis testing a type 1 error also known as a false

positive is rejecting a true null hypothesis a type 2 error also known as

a false negative is failing to reject a false null hypothesis the level significance Alpha is the

probability of a type 1 error if the null hypothesis is true why because if

the null hypothesis is true then the null distribution is the same as the population sampling distribution

and the rejection region which is the region of all Z statistics that would

result in a p-value of less than Alpha as area equal to Alpha therefore the

area of obtaining a z statistic in the rejection region and therefore incorrectly rejecting the null

hypothesis is Alpha power is 1 minus beta and beta is the

probability of a type 2 error this seems confusing but one way to remember it is

if a test is more powerful then it's more likely to make discoveries by correctly rejecting false null

hypotheses therefore as power increases the probability of a type 2 error should

decrease and beta decreases when any of the following occur I recommend not

remembering this list because it's pretty intuitive you can determine the probability of a

type 2 error from a graph so suppose the null hypothesis is false

and the actual population distribution and null distribution are shown right here we would have a type 2 error if we

fail to reject the null hypothesis which means we obtain a test statistic that

does not fall in the rejection region and the probability of obtaining a test

statistic that does not fall in the rejection region from our population is

shaded in yellow so therefore the yellow area is the probability of a type 2

error now we're getting into the sampling distribution for difference of two proportions so in general if you

have two independent variables X and Y then the formulas for the mean and

standard deviation of the distribution ax plus b y are shown right here you do

not need to remember these formulas and if you just want x minus y then you

can get that the mean of that distribution is the mean of x minus the mean of Y and the standard deviation is

the square root of standard deviation x squared plus standard deviation y squared

and then now you can get the standard deviation of the sampling distribution

for D hat one minus P hat two and the standard error

so this is a confidence interval for the difference of two proportions it's the point estimate P hat one minus P hat two

plus or minus the critical value times our standard error

the required conditions are mostly the same as for one sample just check them for both samples the only new condition

is that the two samples must be independent and the steps are the same as for one

sample so here's an example problem we're going back to the first problem we saw in

chapter two and now we're asking at the different high school with 1 500 students the same question is asked to

30 random students the proportion of students and Screen yes is 0.6 construct

and interpret a 99 confidence interval for the difference in population proportions

so all the required conditions are satisfied the two samples are independent because they come from

different populations knowing information about one sample is not going to tell you anything about the other

and then we construct the confidence interval so our Point estimate is 0.7

minus 0.6 which is 0.1 plus or minus the critical value which

can be determined using inverse Norm times the square root of 0.7 times 1

minus 0.7 over 100 plus 0.6 times 1 minus 0.6 over 30 and in the end you're

going to get Negative 0.159 to 0.359

so now we interpret the confidence interval which is pretty simple

and now we're getting into hypothesis testing for difference of two proportions so we change the problem that we just

saw to is there statistically convincing evidence at the level of alpha equals 0.05 that the proportion of students who

exercise at least 150 minutes each week is different for other two schools

so the null hypothesis is that P1 minus P2 equals zero in other words they're

the same and the alternative hypothesis is not equal to because the question was asking are they different

and in AP Statistics whenever you're tested on hypothesis testing for difference of two proportions

you will always have that the null hypothesis is that the difference is zero because if it's not then the

problem is significantly more difficult and you'll see why soon so could we still use the same formula

for standard error as we used in confidence intervals for a difference of two proportions the answer is no because

we must assume that the null hypothesis is true but since P hat one and P hat2

are likely not equal if we use this formula it contradicts the null hypothesis

so we replace P hat one and P hat two with some common value how do we get

this common value the answer is we can combine our two samples into one fold

sample since we know the proportion sample size for each sample we can determine the

number of successes for each sample and we add together the number of successes and the sample sizes to get our pulled

sample and with our pulled sample we can get the proportion so this is the common value that we need

to replace P hat one and P hat2 with in our formula to get the formula for the

standard error for hypothesis testing so now in our formula we can make it

slightly simpler by factoring out P hat c times 1 minus P hat C and P hat C is

the total number of successes over the total sample size

so the conditions for hypothesis testing for a difference of two proportions they're the same as for confidence

intervals but now to check normality you use P hat C

so here are the steps for hypothesis testing for difference in two proportions they're the same as for one

proportion so now revisiting the problem the first

step is we set up the test which we did earlier and we checked the conditions they're all satisfied

and then we obtain a z statistic using the formula that we just saw earlier for

our standard error all right and then we'll get that Z is

approximately 1.027 and then the p-value is 2 times

normalcdf with the lower bound 1.027 and upper bound 10 to the power of

99 and that will give you 0.304

and since that is greater than the level of 0.05 we fail to reject the null

hypothesis we do not have statistically convincing evidence that the proportion of students who exercise at least 150

minutes each week is different for the two schools and that's it for section one if you're

2.2: Sample Mean Inference

still struggling with confidence intervals in hypothesis testing don't worry because section 2 will focus a lot

on those as well the only new topic is T distributions

so if you want to infer about the population mean for a numerical variable

then the sampling distribution for population mean will have mean equal to

the population mean for that variable and standard deviation equal to the

population standard deviation divided by the square root of the sample size and

since you know don't know what the population standard deviation is you can

estimate it using the standard error which is the standard deviation of the

sample which is the standard deviation of the sample divided by the sample size

and there are two ways for the sampling distribution to be approximately normal the first is the central limit theorem

which states that if the sample size is sufficiently large usually at least 30

then the sampling distribution is approximately normal even if the population distribution is skewed the

second way is if the population distribution is approximately normal then the sampling distribution is

approximately normal and you can check this by seeing if your sample is

approximately normal and free from extreme outliers

so you use the T distribution for population mean when the population

standard deviation is unknown and the Z distribution when it's known but this is rare because the fact that you're

collecting a sample means that you don't know about the population so the T distribution for various

degrees of freedom which is one of its parameters is shown right here and if degrees of freedom equals infinity then

the T distribution becomes the same as the Z distribution so how do we generate the T distribution

well suppose we have a population which means zero and standard deviation 15 and

we collect three samples of size nine the T statistic for each sample is

calculated by finding the difference between the sample mean and the actual mean divided by the standard error which

we get by using the standard deviation from our sample and the Z statistic is the sample mean minus the actual mean

divided by the standard deviation of the sampling distribution which is determined by using the standard

deviation of 15. so we can calculate T and Z statistics for each of these samples

and then if you click more and more samples and distribute the T statistics and Z statistics then the T statistics

will form the t-distribution and the Z statistics will form the Z distribution so statisticians have done this with a

large number of samples and that's how they form the Z and T distribution so here are some comparisons between the

Z and T distribution they're both bell curve shaped symmetric and have Amino zero but the T distribution has a lower

Center and higher tails the reason is because some samples will have a low standard deviation resulting in a really

positive or really negative T statistic and the T distribution has a parameter

called degrees of freedom and as degrees of freedom increases the T distribution approaches the Z distribution

so here is the formula for confidence interval for population mean it's the point estimate which is the sample mean

plus or minus the margin of error which is the critical T value times the standard error to find the critical T

value use inverse T which unlike inverse Norm does not allow you to choose left

Center or right tail it only accepts the left tail and to find the area of the left tail it's just 1 minus 0.5 times

the confidence level as a decimal or just the confidence level as a percentage divide by a hundred and the

degrees of freedom is n minus 1. so the required conditions are mostly the same as for proportions the only

difference is to check normality you check if n is greater than or equal to 30 or the population distribution is

approximately normal so here is an example problem so we're going back to our high school but now

instead of finding the proportion of students that exercise at least 150 minutes each week we want to find the

mean exercise time so we obtain the sample of 30 students and get a mean of

140 minutes and a standard deviation of 70 minutes construct and interpret a 95

confidence interval for the true average time that the students at the high school exercise each week

all right so all the required conditions are satisfied the sample size is 30 which is great enough for the central

limit theorem to fold so now constructing the confidence interval the critical T value is negative inverse

t with area 0.2025 and degrees of freedom 29

and then plug in all the values that we obtained in the end we're going to get approximately

113.862 to 166.138 and then interpreting the

confidence interval is pretty self-explanatory and then for hypothesis testing the

steps are the same as four proportions but then now we're obtaining a t statistic instead of a z statistic

so now we're asking is there statistically convincing evidence at the level of alpha equals 0.1 that the

average time that the students at the high school exercise each week is greater than two hours

so we set up the test two hours is the same as 120 minutes so we're testing mu

equals 120 Minutes versus mu is greater than 120 minutes and the required

conditions are satisfied and then now we obtain a t statistic

it's the difference between our sample mean and the hypothesized mean divided by the standard error

it's approximately 1.565 and then since the alternative hypothesis was greater we find the area

of the right tail which is approximately 0.064

and since 0.064 is less than the level of 0.1 we reject the null hypothesis

so now we're getting into difference of two means and depending on how our data is structured there are two ways to

perform statistical inference for this if the data is from a matched pair samples which if you remember means that

everyone in One sample is paired with somebody in the other sample then we can think of it as one sample of pairs where

each value is the difference of a pair in the Matched pair sample then proceed using the one sample T interval or test

this sounds confusing but I'll show you what it means on the next slide and if the data is from two independent samples

they must be independent then proceed using the two sample T interval test

so if you have a matched pairs sample then for each pair you compute the

difference and then with this sample of differences you can then perform a one sample T

interval or test for example you can test if the mean of the sample of differences is zero against greater than

zero and then now for a two sample T interval or test the standard deviation of x

minus y is the square root of the standard deviation of x squared plus the standard deviation of Y squared

so here is the confidence interval for difference of two means

and then make sure you check the required conditions for both samples and that they're independent

the degrees of freedom are quite complicated to determine and usually left to a graphing calculator to

determine but it's always between the smaller of N1 and N2 minus 1 and N1 plus

N2 minus two as a conservative estimate you can use the lower bound which is

smaller of N1 and N2 minus one for the degrees of freedom because that will

always produce wider confidence intervals than the actual degrees of freedom so here's an example problem we have the

same problem but now we have a different high school with 1 500 students and then we asked the same question to 50

students and get a mean of 100 minutes and a standard deviation of 40 minutes constructed and interpret a 99

confidence interval for the true difference in average times that the students at the high schools exercise

each week you may use a conservative estimate for DF all right so all of the required

conditions are satisfied and then now we construct the confidence interval so our Point estimate is 140

minus 100 which is 40 and then to determine the critical T value the area

is 0.005 since we want a 99 confidence level and the degrees of freedom since

30 is smaller than 50 the degrees of freedom are estimated by using 30 minus

1 which is 29. all right and then after that in the end you're going to get

approximately 1.476 to 78.524

all right and then interpreting the confidence interval and here are the steps for hypothesis

testing for difference in two means so now we're asking is there statistically convincing evidence at the level of

alpha equals 0.01 that the average time that students at the first school exercise is not the same as the average

time that schools at the second School exercise all right so we set up our test tens is asking whether they're same or

different the null hypothesis is not the difference is zero and the alternative it's that it's not equal to zero

all right and then we obtain a t statistic which is our Point estimate of 40 minus the hypothesized difference of

zero divided by the standard error all right and then now we obtain a

p-value and then in the end we obtain the results since the p-value of 0.008 is

less than the level of 0.01 we reject the null hypothesis

2.3: Linear Regression Fundamentals

so now we're moving on to section 3 which is on linear regression and this section does not involve confidence

intervals and hypothesis testing but in the next section we will apply those two concepts to linear regression

so let's start off with describing bivariate relationships they can be described as positive or negative as one

increases what happens to the other linear or non-linear is it straight and strong or weak are the points packed

closely together or not so for example the first one is positive because as one

increases the other also increases linear because the relation is pretty straight and strong because the points

are pretty close to the straight line of best fit if you want pause the video here and try to describe the other four

relationships all right here's the answer the second one is negative linear and

weak the third one is negative linear and strong

the fourth one is positive non-linear because the straight line does not best fit the points and rather the quadratic

relationship does better and strong because the points are pretty close to the quadratic relationship

and this last one is positive linear and still strong even though there are some

points that do not fit in for the most part the points are pretty close to the straight line

the next topic is residuals so suppose you have a number of points and you fit

the line Y hat which is a symbol for predicted y value equals x to the set of

data the residual of each point is its y-coordinate minus y hat so for example

for the point four comma 3 the residual is negative one and you can determine the residual for

every point and you can square all of them to make all your values positive

and a residual plot shows the x coordinate of each point against its residual

and obviously a high residual squared is not good because it means that your fitted line is less accurate so what

least squares linear regression aims to do is find the line that minimizes the

sum of squares of residuals so the line is y hat equals a plus BX

and here are the formulas for A and B you do not have to remember them you're not even going to be tested on them at

all on the AP exam so the correlation coefficient R

represents how strong the linear relationship is and you do not have to remember the formula for this but you do

need to remember how to interpret it so R is always between negative one and one if it's negative one it means that

there's perfect negative correlation meaning that all the points fall perfectly on a line with negative slope

and then R equals one means perfect positive correlation R equals zero means that there's no

correlation at all and you can also have anything else in between negative one and one

the coefficient of determination which happens to be equal to r squared is the

total sum of squares minus residual sum of squares over total sum of squares so

the total sum of squares is the sum of squares of residuals when you do regression using only an intercept and

the residual sum of squares is the sum of squares of residuals using normal linear regression and the percentage

reduction is the coefficient of determination so for example if the total sum of

squares is 40 and the residual sum of squares is 32 then the percentage reduction is 20 and r squared is equal

to 0.2 the interpretation is that 20 of the variability in the dependent

variable is accounted for Bible regression model on the independent variable the important thing to remember

on this slide is the interpretation because it could be tested on the AP exam

the root mean squared deviation is the square root of the sum of squares of residuals Over N minus two and this is

also known as standard deviation of residuals because it's the same as the standard deviation of the residuals when

the mean is zero which happens to be true of least squares linear regression the only difference is that the

denominator is n minus 2 instead of n minus 1 because you're trying to estimate two variables with linear

regression and the root mean score deviation just gives us an idea of how large or small

the residuals are now let's suppose you have non-linear data how could you do regression on this

so for example for this graph what can you do well instead of creating a linear model

for Y versus X you can try Y versus x squared or square root of Y versus X or

you can also experiment with other things such as Y versus e to the power of X Y versus 1 over X and then find the

best model in the end so for example if you have a table of

all the X and Y coordinates of each point then you can turn it into a table of x squared versus y or X versus y

squared and then do linear regression on these tables influential points are points that if

removed would change the linear model significantly there are generally two types of influential points outliers

which have very extreme residuals and high leverage points which have a very unusual x value in the outlier example

shown here removing the outlier would make R closer to one because it improves the correlation decrease the y-intercept

because that point shifts the line upward and decrease the root mean score deviation because that point has a very

high residual in the high leverage Point example shown here removing that point

would make R farther from one this seems counterintuitive but the point is already very close to the least scores

linear regression line so removing it would actually make the correlation worse increase the root mean Square

deviation because that point has a low residual and increase the slope because

that point drags the right side of the line downwards and in the last example which is both the not liar and high

leverage Point removing that point would make R closer to one decrease the root mean squared deviation significantly

increase the slope and significantly decrease the y-intercept if this slide seems confusing the most important thing

to remember are just the terms influential Point outlier and high leverage point

so here are some properties of the least scores regression line you generally don't have to remember these but looking

over them is still pretty useful now we're almost done with chapter two only one section left

2.4: Linear Regression Inference

so if you consider the slopes of all possible samples of a specified size from your population here are some

formulas for that sampling distribution you do not need to remember these formulas but you should remember the

interpretation of slope because it could be tested on the AP exam

here are the conditions for linear regression inference you can remember them with the acronym liner you

recognize two of these conditions already Independence and random sample and the rest of the conditions are

linearity which is the relationship between X and y's linear normality which

means for every x value the distribution of Y is approximately normal and equal

variance which is the variance of residual is the same for any value of x

so you can check these conditions using residual plots the linearity condition

is not satisfied if there is a clear pattern in your residual plot for

example in this residual plot it seems like it first decreases and then increases and indeed in the original

scatter plot the relationship is more quadratic than linear the equal variance condition is not

satisfied if the spread in y depends on the x value for example in this residual

plot you can see that for higher values of X the Y values are much more spread out

the normality condition is not satisfied if for a given x value the distribution

of Y isn't normal in this residual plot that is the case you can see for certain

X values the distribution of Y is skewed so here's the confidence interval for

slope of a regression model it is the point estimate which is the slope in the least squares linear regression model

plus or minus the margin of error which is the critical T value times the standard error and the critical T value

is determined with degrees of freedom equals n minus 2 not n minus 1.

so here's a sample problem an athletic trainer wanted to investigate the relationship between an athlete's

resting heart rate and the heart rate after exercise from a sample of 12 athletes the athletic trainer recorded

each athlete's resting heart rate in beats per minute and heart rate in beats per minute after 5 minutes of moderate

exercise the results of a regression analysis of the data are shown in the following computer output assume the

conditions for inference are met which of the following represents a 95 confidence interval for the slope of the

population regression line so as you can see you don't have to determine the interval by yourself you will generally

be given computer output since the formulas are pretty complicated

so here's how you use the computer output the predictors on the left side fill the y-intercept and the explanatory

variable make sure you do not use the y-intercept and here are the coefficients for the

least scores regression line here is the standard error

and although it's not necessary in this problem this is the coefficient of determination

and now we can solve the problem we know two of the three values that we need to know and for the critical T value since

we want a confidence level of 95 percent and we have a sample size of 12 we can

use inverse t with area 0.025 and degrees of freedom equals 10 and in the

end you'll get this answer which is answer Choice e so we can also do hypothesis testing for

the slope of a regression model and it's common to test beta equals zero against

beta is not equal to zero because beta equals zero means that the two variables are independent the predicted y value

does not depend on the x value so here's a sample problem it's the same

as what we just saw but now it's asking assuming the conditions for inference are met is they're statistically

convincing evidence at the level of alpha equals 0.05 that heart rate after exercise is dependent on resting heart

rate all right so that's where these last two columns come in these two columns test

beta equals zero against beta is not equal to zero the T statistic which you can determine

by yourself is conveniently given and so is the p-value

and since the p-value of 0.03 is lower than the level of 0.05 we reject the

null hypothesis we have statistically convincing evidence that heart rate after exercise is dependent on resting

heart rate now I'll show you how to do statistical inference on the graphing calculator

Chapter 2: Statistical Inference on Graphing Calculator

so hit stat and go to test so this is the one sample t-test

this is the two sample t-test and I'll go in and show you how to do it you can enter either raw data or

statistics so the main standard deviation and size of each sample but

let's go back to data I'll show you how to do this so you need two lists of numbers and to

edit each list hit stat and then edit and as you can see I already entered

some values for two lists so let's go back if you want to change one of these lists

then hit second list and then choose the name of the correct list

and then frequency you can just leave those as one and this is the alternative hypothesis

and then pulled if you select yes for this then it will combine your two samples into one in order to find a

common standard error to use for the test so this will be done when you're

assuming that the two samples come from populations with the same standard deviation but since this is usually not

the case usually you leave this as null all right and then if you press

calculate it Returns the results including our p-value

all right now I'll show you how to do a one proportion Z test so here's the null proportion

and then this is the number of successes in our sample it's not P hat but rather

the number of successes in the sample and then this is the sample size

all right and then you can press calculate and it Returns the results including our p-value

all right and then you can also create confidence intervals for example this is the one sample T interval

this is a two sample T interval and one proportion and two proportion C intervals

and to do linear regression test or intervals those are right here so let's go in I'll show you how to do a

linear regression T interval all right so you need two lists of numbers

and then you can just leave all of these as is you don't have to enter anything for reg EQ

and then if you press enter it will return the results of the interval

Chapter 2 Review

now we're done with chapter 2 and into the review pause whenever you need so

first here are the steps for confidence intervals here are the steps for hypothesis

testing these are some useful formulas to remember if you can't remember all of

them most of them will show up on the formula sheet but still it's best to remember as

many as you can these are the areas you can make in hypothesis testing

here are the most important Concepts from section 3 which was on linear regression

these are the conditions for linear regression inference as well as how you can check them

and finally here is using computer output [Music]

so we're all done with the hardest chapter of this video and we're into our last chapter this chapter is on

Chapter 3 (Probability) Intro

probability so here's the overview

and here is the AP Statistics framework section one will be on probability rules

3.1: Probability Rules

so here are some important notations a intersection B which is symbolized by a

symbol that looks like an upside down u means A and B P of a intersection b

means the probability that both events will occur a union B which is symbolized by the

upside down of the intersection symbol means A or B the probability of a union

b means the probability that at least one of A and B will occur

and a straight line downward means given and probability of a given b means the

probability of a occurring given that B has occurred an a prime or a superscript c means the

complement of a and they represent a not occurring so the probability of that is

one minus the probability that a will occur events are independent if and only if

knowing information about one event doesn't tell you anything about the other events so if you have two events

it means that the probability of a equals the probability of a given B because knowing that b occurred doesn't

tell you anything about a and similarly probability of b equals probability of B

given a so here's the product rule the probability of A and B is equal to the

probability of a times the probability of B given a and you can also rearrange this to get

that the probability of a given b equals the probability of A and B divided by

the probability of B and this shows up on the equation sheet all right and if a and b are independent

then the probability of B given a can just become probability of B and you can simplify the product rule a little bit

so events are mutually exclusive if and only if at most one of them can occur so

if you have two events this means that the probability of a and b equals zero because you can't have both of them

occurring you can have at most one of them occurring and similarly probability of a given B and B given a are both

equal to zero so here's the sum rule the probability of a or b equals the probability of a

plus the probability of B minus the probability of A and B and the reason

you have to subtract probability of A and B is because it is counted twice but

if a and b are mutually exclusive then probability of A and B becomes zero so

probability of A or B just becomes probability of a plus probability of B

here are some optional formulas if a b c and so on are all mutually exclusive

then probability of a or b or c or so on is equal to the probability of a plus

probability of B plus probability of c and so on you can get this from the sum formula that you saw on the last slide

and if exactly one of BC and so on will occur then probability of a is equal to

the probability of A and B plus the probability of a and c and so on which is also equal to the probability of B

times the probability of a given B plus probability of c times the probability of a given C and so on

so you can remember the important Concepts from this section with the acronym Sigma comp

here are some optional practice problems the third one is somewhat difficult because it requires using one of the

optional formulas if you can't solve it don't worry it's just meant to show an example of how that formula can be used

but anyway pause the video here if you'd like to give these problems a try

all right here comes the solutions so for the first one the coin flip and dice roll are independent so the

probability of both events is just the product of the two individual probabilities which is 1 12. and for the

second one a student can't be both the freshman and sophomore so the events are mutually exclusive and then using the

sum rule you get that the probability of freshman or sophomore is equal to 0.3

plus 0.2 minus 0 which is 0.5 and for the third one you know that it will

either rain or not rain tomorrow so you can use the optional formula and the

probability that it doesn't rain is equal to one minus the probability that it will rain and then after that you can

plug in all the values and you'll get that The Final Answer is 0.45

3.2: Tables

all right so now we're moving into section two which is on tables

so tables can be either one way or two way which you'll see on the next line and they can show either frequency or

relative frequency frequency just shows the total number of times each observation occurs for example it can

show the total number of freshmen in a school or it can show the total number

of students in a high school which is just the total number of freshmen plus number of sophomores plus Juniors plus

seniors all right and then relative frequency just shows the proportion so for example

the relative frequency of freshman is the number of freshmen divided by the overall total

all right and you can also make tables too late so if you want to show two categorical variables for example you

want to add male or female then you can use a two-way table and once again it

can show either frequency or relative frequency so conditional marginal and Joint

probability are just some terms for a given B A and A and B respectively so as

an example of conditional probability the probability of female given freshmen

is the number of people who are both fresh men and female divided by the

number of Freshmen the probability of sophomore given male is the number of people who are both

sophomore and male divided by the number of people who are male and then for marginal probability you

just take a row or column total and divide it by the overall total so for

example the probability of sophomore is the number of people who are sophomore

divide by the total number of people and for joint it's just an individual

cell entry divided by the overall total so for example the probability of Junior

and female is the number of people who are both Junior and female divided by the overall total

and then there are similar rules for determining the probabilities in a relative frequency table

and the only new thing on this slide is the probability of a or b and that can

be determined by taking the roll total plus the column total and subtracting

the cell entry this comes from the sum rule all right and you can also determine

whether two events are independent from a two-way table so for example are the

events being sophomore and being male independent well to determine that you can determine

whether probability of sophomore equals probability of sophomore given male or

probability of male equals probability of male given sophomore you only need to check one of these because if one of

these is true then the other is true as well all right so let's just check the first

one so the probability of sophomore is 450 divided by 1800 and the probability

of sophomore given male is 230 divided by 910 and these two are not equal so

the events are not independent another useful function of tables is

organizing data to solve word problems so here's a sample problem that I'll walk you through and then I'll show you

an optional practice problem so here we go a certain test for a disease has a false negative rate of 10

and a false positive rate of one percent in a certain population two percent of

individuals currently have the disease a if somebody in the population tests positive for the disease what is the

probability they actually have it B what is the probability of a test giving a correct result

so the false negative rate means that if you're actually positive then that is

the probability that you'll test negative and the false positive rate means that if you're actually negative

then that is the probability that you'll test positive so we're given the information that's

listed here and we can fill in a relative frequency table with that information so first the probability of

actually being positive is 0.02 and then since we know the overall total is one we can determine the column total

of actually negative and then we can figure out joint probabilities using the product rule

remember that the probability of A and B is equal to the probability of a times the probability of B given a

all right and then now since we know the column totals we can fill in these two cells I know the numbers are getting

kind of complicated but the important thing is that you understand the procedures and now we can fill in the rest of the

table so now we can solve part A which is asking for the probability of actually

being positive given that you test positive and that is the probability of being actually positive and testing

positive divided by the probability of testing positive and The Final Answer you'll get is 0.647

for Part B the probability of a test being correct is equal to the probability of actually being positive

and testing positive plus the probability of actually being negative and testing negative in the end you'll

get 0.9882 so here's the practice problem the

numbers here are a lot nicer and I'll tell you right now that you don't actually have to fill out the entire

table to solve the problem but anyway pause if you'd like to give this problem a try

all right here comes the solution so you're given the following information and you can use it to fill

out the relative frequency table so the probability of male is 0.5 the

probability of less than one year is 0.4 same with one to five years and then the joint probability of having

work for more than five years and being male is 0.5 times 0.4 which is 0.2

and now since the overall total is one you can further fill out these cells

and now you can solve that the probability of being female and having worked more than five years is zero and

because it's zero the events are mutually exclusive now we're all done with section two

3.3: Chi-Square Tests

section three will be on the final type of hypothesis test the reason why this was not part of chapter two is because

it's applied to tables which you haven't learned about until just now

so there are three types of chi-square tests but really only two because two of them are almost identical at the end of

this chapter I'll show you how you can do these tests on a graphing calculator so first off is the chi-square goodness

of fit or gof test which tests whether sample data for one categorical variable

came from a hypothesized distribution so here's an example problem Maria wants

to check whether a six-sided die is fair if he rolls that die 60 times and Records the results below do the data

provides statistically convincing evidence at the level of alpha equals 0.05 that the dye is not fair

the steps for chi-score tests are the same as for any other test but now it's just obtaining a chi-score statistic

rather than the Z or t statistic so here are the required conditions

random sample 10 Rule and a new condition called large expected counts

the reason is because the larger the expected counts are the more accurate the chi-score test is

so in AP Statistics generally how you check this is seeing whether all

expected counts are greater than or equal to five and the chi-square statistic is the sum

over all cell entries of The observed count minus the expected count squared divided by the expected count

so here is the chi-square distribution it is the distribution of all chi-score

statistics assuming that the null hypothesis is true so the most important characteristic is

that the degrees of freedom is the number of categories minus one and then some other characteristics are that it's

skewed right but less skewed for higher degrees of freedom the mean is equal to the degrees of freedom and the mode is

the degrees of freedom minus two and the p-value is determined by the area to the

right of the chi-square statistic so now let's go back to the problem

to solve this problem we first set up the hypotheses the null hypothesis is that the die is fair the alternative

hypothesis is that the die is not fair and if we assume that the dye is Fair since the die was rolled 60 times we

would expect to see each number 10 times so our expected count for each number rolled is 10.

now we check the conditions we do have a random sample we don't need to check the 10 percent rule because we're not

sampling without replacement and all the expected counts are 10 which is at least five and keep in mind that the observed

count of four that you saw doesn't matter because you only need to check the expected counts

all right now we obtain the chi-square statistic so we sum over the numbers one

through six The observed count minus the expected count squared divided by the

expected count and you're going to get 13.4 in the end

all right now we obtain a p-value since there were six categories the degrees of freedom is 5 and the right tail of the

chi-square distribution has area 0.020 and since that's less than the level of

0.05 we reject the null hypothesis we have statistically convincing evidence

that the die is not fair all right so now we're moving on to the

chi-square test of Independence which tests whether two categorical variables are independent so here's an example

problem you want to find whether there is an association between grade level and interest at your school you collect

the sample of 300 students and record the responses below do the data provide statistically convincing evidence at the

level of alpha equals 0.05 that there is an association between grade level and interest assume the random sample

condition and 10 percent rule are satisfied all right so the null hypothesis in a

test of Independence is always going to be that the two variables are independent and the alternative is that

they're associated the expected count for each cell assuming that the variables are

independent is the row total times the column total over the overall total you

can check that this satisfies the independence condition the degrees of freedom is the number of

rows minus one times the number of columns minus one the reason is because that is the number of cells that are

free to vary without changing any row or column total once you set that number of

cells the rest of the cells can be automatically determined so now let's go back to the problem

so the expected counts for each cell are shown right here for example the

expected count for freshman and stem is 135 times 75 divided by 300 which is

33.75 and all expected counts are at least five so the large expected counts

condition is satisfied all right so now you sum over all of the

cells The observed minus expected squared divided by the expected to get

the chi-score statistic and then use the chi-square distribution

with degrees of freedom equals six and you'll find that the area of the right tail is 0.275 so that's our

p-value and since it's greater than the level of 0.05 we fail to reject the null

hypothesis we do not have statistically convincing evidence that there is an association between grade level and

interest the final type of chi-score test is the chi-square test of homogeneity which

tests whether the distribution of a categorical variable is the same for different populations or treatments and

this is very similar to the test of Independence so an example of the two populations

could be two different schools and you want to see whether the interest distributions of those two schools are

the same or different so the way you go about solving this is almost the same as for a test of

Independence the only difference really is the hypotheses

all right so now let's find all of the expected counts

and all the expected counts are at least five so that condition is satisfied

and then now we obtain the chi-score statistic which will be 2.344

and the degrees of freedom is to and we can obtain that the p-value is

0.310 and since that is greater than the level of 0.05 we fail to reject the null

hypothesis we do not have statistically convincing evidence that the interest distributions are different for the two

schools all right so last section binomial and

3.4: Binomial & Geometric Probability

geometric probability binomial probability involves repeating an action with the chance P of success n

times and determining the probability of exactly X successes so for example a

spinner has a 0.6 chance of landing on green if you spin the spinner five times what is the probability that it will

land on green exactly three times well let's start off with an easier problem suppose that now we have the order fixed

and the first three spins must land on green and the last two must not what is the probability of that happening

well because the spins are all independent the product rule tells us that we can just multiply the individual

probabilities together and we'll get 0.03456 however there are five choose three or

five factorial divided by three factorial divided by 2 factorial ways to choose which three spins land on green

so therefore we must multiply what we got by five choose three which is ten

and we'll get a final answer of 0.3456 so in general this is the binomial

probability formula and it will show up on the equation sheet all right and here's the binomial

distribution it has the number of successes on the x-axis and the

probability on the y-axis and this is an example of a discrete distribution

for any discrete distribution the mean is the sum over all X values of the x

value times the probability of that x value and the standard deviation is the square root of the sum over all X values

of the x value minus the mean squared times the probability of that x value

and for a binomial distribution the mean is NP and the standard deviation is

square root of n times P times 1 minus p and all of these equations will show up on the formula sheet

all right so now into geometric probability which involves repeating an action with the chance P of successes

and determining the probability of the first success being on the X trial so

for example a spinner has a 0.3 chance of landing on red if you continuously

spin the spinner what is the probability that the spinner lands on red for the first time on the third spin

well in that case we need the first two spins to not be read and the third spin to be red and then we can multiply the

individual's probabilities together and we'll get 0.147 so in general here is the geometric

probability formula it's 1 minus P to the power of x minus 1 where X is the

trial that you want your first success to be on times p

all right and here is what the geometric distribution looks like and keep in mind that it keeps going all the way to

Infinity for the X values but we obviously can't show that many values here

all right and for geometric distribution its mean is one over p and the standard

deviation is the square root of 1 minus p over p last topic for chapter three using the

Chapter 3: Chi-Square, Binom, and Geom on Graphing Calculator

graphing calculator so first I'll show you how to do chi-square tests go to stat and keep in mind that on the

AP exam they won't often ask you to do an entire chi-score test because they try not to make the questions too time

consuming but it's still good to know how to do one on your graphing calculator so here the chi-square goodness of fit

test so you enter two lists one for the observed counts and one for the expected as well as the degrees of freedom number

of categories minus one and then if you select calculate then it will return the p-value for you

all right and then the chi-square test of Independence or homogeneity is right

here you need to enter two matrices to edit the matrices go to second Matrix

and then edit and it will allow you to change the dimensions as well as the entries so I

already entered some stuff so let's go back all right if you select calculate

then it will return the p-value for you so the perk of this is that you won't

have to calculate the chi-score statistic anymore and you can also use a graphing calculator to help you with binomial or

geometric probability so go to Second distribution and then the first important function is

binomial PDF so you have to enter the trials

the probability of success and the x value and this will return the

probability of exactly X successes and then now if we go back

and use binomial CDF so enter the number of Trials

probability of success and x value then it will return the probability of less

than or equal to X successes and then similarly or geometric PDF and CDF

you enter the probability of success and the x value PDF will give you the probability of the

first success being on the x file and then for geometric CDF

it will return the probability that the first success will be on the X

trial or any trial before that and now we're all done with chapter

Chapter 3 Review

three so here's the review first here is the sigma comp acronym for the first

section here are some important facts about tables

here are the steps and specifics for performing chi-square tests

and finally here are the concepts for binomial and geometric probability

[Music] so I hope this video was really helpful if you have questions always feel free

Outro

to ask Down Below in the comments if you enjoyed this video check out the rest of my channel where I have a lot of other

useful courses as well thanks for watching I hope to see you next time