AP Statistics: Ultimate Study Guide

Unit 1: Normal Distribution

The most you can say is the distribution is approx. normal

  • (68-95-99.7 rule)

  • The empirical Rule only goes for normal distributions

  • smaller = similar

  • large = different/ varied

  • (Z-score gives standard deviation)

Do!

  1. label axis (units of measurement)

  2. identity values

  3. shade area of interest

  4. perform calculations

  5. z-score gives you the answer!

Calculator Functions

  1. 2nd VARS

  2. normalcdf

  3. enter (lower, upper, mean, standard deviation) to calculate the cumulative probability for a normal distribution.

  4. Gives you percentage in decimal form (0.0668 = 6.68%)

Normal Curve

  • symmetrical about the mean = mode = median

  • never touches x-axisthe

  • total area under curve = 1

  • the shape of the graph is influenced by

    • mean (horizontal) shift right

    • standard deviation (width of the graph)

Bimodal Distributions

~ a type of discrete probability ~

Bernoulli sequence: a sequence of hits that have only 2 outcomes (coin flip)

  • identical trails

  • must be independent

If x is a random varible with bimodal distrubtions

  • Probability of x no of successes in n trails

Order is not important

Mean & Standard Deviation

The values of μ & p change the graph shape

μ: as μ increases, more lines make the graph closer to a bell-shaped curve

p: as p increases (consent), the graph tail is strengthened more to the left

  • if p < 0.5 tail on the right (very skewed)

  1. A response variable measures an outcome of a study (y) (explanatory predicts)

  2. An explanatory variable may help or explain the change in a response variable (x)

Scatterplot: shows the relationship between 2 quantitative variables measured for the same individual

  • each individual appears as a point

Correlation: r gives the direction and measures the strength of the linear association between 2 quantitative variables

  • if asked for r, and you only have r² square root to find r

    The Linear association between x and y is _____ and ______.

Regression Line: models show how a response variable (y) changes as an explanatory variable (x) change

  • y = a +bx

Residuals: distance between actual and predicted

actual y - predicted

y= residual

The actual y was redisdual above/below the predicted y of x-context.

High Leverage Points

  • influential points have the most significant impact

  • an outlier can also be a high-leverage point

    • not all outliers are high-leverage

  • away from data but with the same x-values

  • not all outliers are influential

    • they can be influential though

  • if both it can be an outlier

To determine if an outlier is considered:

  • leverage

  • x-values

  • residual distance from the regression line

Sx of the residuals: measures the size of a typical residual

The actual y is typically about Sx away from the predicted y by the LSRL

  • lethe ast square regression line

r²% of the variations of x-context is explained by the linear relationship with log(y-context)

When a scatterplot shows a curved relationship between 2 quantitative variables, transform one or bothvariablesse to crea linear associations

  • choose a model whose residual plot has the most random scatter (no curve)

  • if more than one model produces a randomly scattered residual plot, choose the model with the largest coefficient of determination (r²/r)

Simple Random Sample

  • relies on using a selection method that provides each survey participant with an equal chance of being selected

  • based on probability and random selection

  • more likely to be representative of the total population (free of bias)

  1. Label (give each individual a number)

  2. randomize (use a number generator)

  3. select (choose individuals that correspond)

Calculator

  • math

  • prob

  • randInt(up,lower,h)

For a population of 30 students, choose 5 random students

  • 29

  • 28

  • 5

  • 16

  • 13

Choose SRS with a table

  1. label

  2. randomize

  3. select

more Biased

Stratified Random Sampling

  • for surveys (total population is divided into groups (strata)

  • grouping of similarities (SRS from each group)

  • used by researchers when trying to evaluate data

  1. define the groups

  2. define the sample size (ratio)

  3. randomly select from each group

  4. review results

Low Bias & Low Variability

Low bias: your data is standard (symmetric)

Low variability: your data is similar to the mean

If your sample has both it is most likely the population and generalizations to the population are most likely to be accurate

Advantages

  • symmetric demographics

  • fair method

  • helps efficient

  • accurate data

Disadvantages

  • prior knowledge

  • may not be representative

  • more complicated

  • selection bias

Strata: groups of individuals in a population

  • a strata random sample is selected by choosing an SRS from each strata

Sampling variability: the static information from a sample will vary as the random sampling is repeated

  • will decrease as the sample size increases

cluster: a group of individuals that are located near each other

  • divides the population into groups or clusters

    • selected to make up your total sample group for a study

  • useful when surveying a large population and natural grouping

    • might be bias

  1. define the population and cluster size

  2. generate your clusters

  3. randomly select clusters

  4. collect data

  5. analyze and interpret data

Advantages

  • cost-effective

  • efficiency

  • speed natural grouping

Disadvantages

  • bias on sampling

  • complexity

Groups are not similar

  • sample all from some groups

Systemic random sampling is a large population selected, according to a random starting point but with a fixed periodic interval

  • calculated by dividing the population size by the desired sample size

  1. confirm population total

  2. determine sample size

  3. determine sampling interval

  4. select a random stat point

  5. add sampling interval until the desired sample

Cluster sampling

systemic sampling

stratified sampling

simple random sampling

Population

The population is divided into clusters/groups

population is divided into groups

The population is divided into strats or subgroups

while the population is considered

sampling unit

clusters are selected randomly but the entire population of clusters is surveyed

every nth unit in the population is selected for surveying

Individuals within each are randomly chosen for surveying

Individuals are randomly selected from the population for surveying

homogeneity with the sample unit

high homogeneity within each selected cluster

assumes homogeneity within selected intervals

lower homogeneity with each strat/subgroup

assumed homogeneity across the entire population

complexity

fewer stages of the sampling method involved

simple to implement with one-stage sampling

more stages of sampling involved

simple to add with 1-stage sampling

Undercoverage occurs when some members of the population are least likely to be chosen or cannot be chosen in a sample

  • a survey of households (excludes homeless people, prisoners, students, dormitories.)

Response bias occurs when there is a systemic pattern of inaccurate answers to a survey question.

Confounding occurs when 2 variables are associated in a way that their effects on a response variable cannot be distinguished from each other

  • possible different variable (3rd variable)

Treatment is a condition applied to individuals in an experimentA

  • A placebo is a treatment that has no active ingredient

Factor is an explanatory variable that is manipulated and may cause a change in the response variable

  • the different values of a factor are called levels

A control group is used to provide a baseline for sampling the effects of other treatments

  • 2 different groups, 1 with treatment and 1 without (control variable)

The placebo Effect describes the fact that some subjects will respond favorably to any treatment..nt

Single-blind: either subject or people who interact with them and measure the response variable don’t know which treatment a subject is receiving

  • neither subject nor people (double Blind)

Block is a group of experimental units

  • randomized block is carried out within each block

Matched Pairs Design is a common experimental design for comparing 2 treatments tuseses a block of size 2.

Replication of Trials

  • Observation: select people

  • Experiment: random assignment, treatment, causation

    • 1. comparison

    • random assignment

    • control

    • replication

Randomized block design

  • you divide your participants into subgroups

    • within the blocks they have similarities

Statistical significance

  • helps quantify results likely to be based on factors of interest

    • (chance or not?) (lucky or unlucky)

Scope of inference

  • random selection

  • random assignment

    • allows for inference from population (cause/effect)

A random sample will enable us to generalize our conclusions to the population from which we have sampled

  • when can we decide on the causation

    • when we generalize

Probability

  • any outcome of a random process is a number between 0 and 1, that describes the proportion of times that outcome would occur in a very long series of trials.

  • an outcome that never occurs has a probability of zero

  • an outcome that appears/happens on every trial has a probability of 1

  • an outcome that happens 1.2 of the time has a probability of 1.2

The law of large numbers says that if we observe more and more trials of any random process, the proportion of times that a specific outcome occurs approaches its probability.

After many many contexts, the proportion of times that context A will occur is about P(A).

Simulation

  • describe how to set up (use a random process)

  • identify what your recording

  • perform many trials

  • use results to answer question

A sample size is the set of all possible outcomes of an experiment or simulation..on

Mutually Exclusive Events

  • 2 or more events that cannot occur at the same time

    • disjoint event (P(A or B) = P(A) + P(B)

RULES

  • between zero and one

  • all outcomes probability = one

  • probability of an event is one minus the probability

Venn diagrams

  • represented by 2 circles that overlap to show a relationship

General Addition Rule

  • if A and B are 2 events resulting from the same random process

Introspection (n) (and)

  • all outcomes that are common to both sides

Union(U)(or)

  • all outcomes that are not common to both sides

Conditional probability

  • one event happens given that another event is known to have happened

A tree diagram shows the sample space of a random process including multiple stages.

  • and calculates probability

A random variable takes numerical values that describe the outcomes of a random process.

A probability distribution gives the possible values and their possibilities

  • to be a valid probability model

    • between zero and one

    • sum of all probability = 100%

Discrete random variable

  • a countable set of possible variables with gapes on a number line (whole number only)

Histogram of probability distribution (no gaps)

  • values of a random variable

  • probability

  • one bar per each x-value

The mean or expected value is its average over many trials of the same random process

  • to find multiple each possible by its probability then add to the sum

If many many context are randomly selected by the average amount of context of random variable would be about _____ (units).

  • can also be found from a cumulative probability distribution for the random variable

The median of a discrete random variable is the midpoint of a distribution that varies from the .mean.

If many contexts are randomly selected, the context will typically vary from the mean of x by about standard deviation (units).

Transforming probility distributions probability

  • adding/subtracting

    • measures of center (mean or median)

    • doesn’t change variability

    • doesn’t change the shape of the probability distribution

  • multiplying/dividing

    • measures the center by b

    • measures the variability by b

    • doesn’t change the shape of the distribution

Effect of a linear transformation on random variable

  • has the same shape as the probability distribution of x (if b>a)