Cumulative AP Exam Study Guide

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Knowt Play

New

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/79

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

80 Terms

New cards

Statistics

The science of collecting, analyzing, and drawing conclusions from data

New cards

Descriptive Statistics

Methods of organizing and summarizing statistics

New cards

Inferential Statistics

Making generalizations from a sample to the population

New cards

Population

An entire collection of individuals or objects

New cards

Sample

A subset of the population selected for study

New cards

Variable

Any characteristic whose value changes

New cards

Data

Observations on single or multi-variables

New cards

Categorical Variables (Qualitative)

Basic characteristics

New cards

Numerical Variables (Quantitative)

Measurements or observations of numerical data

New cards

Discrete Variables

Listable sets (counts)

New cards

Continuous Variables

Any value over an interval of values (measurements)

New cards

Univariate

One variable

New cards

Bivariate

Two variables

New cards

Multivariate

Many variables

New cards

Symmetrical Distribution

Data on which both sides are fairly the same shape and size (“Bell Curve”)

New cards

Uniform Distribution

Every class has an equal frequency (number) → “a rectangle”

New cards

Skewed Distribution

One side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right)

New cards

Bimodal Distribution

Data of two or more classes have large frequencies separated by another class between them (“double hump camel”)

New cards

How to describe numerical graphs

Shape - overall type (symmetrical, skewed right left, uniform, or bimodal)
Outliers - gaps, clusters, etc.
Center - middle of the data (mean, median, and mode)
Spread - refers to variability (range, standard deviation, and IQR)

Everything must be in context to the data and situation of the graph. When comparing two distributions – MUST use comparative language!

New cards

Parameter

Value of a population (typically unknown)

New cards

Statistic

A calculated value about a population from a sample(s)

New cards

Measures of Center

Median - the middle point of the data (50th percentile) when the data is in numerical order. If two values are present, then average them together.
Mean - μ is for a population (parameter) and x is for a sample (statistic).
Mode - occurs the most in the data. There can be more then one mode, or no mode at all if all data points occur once.

New cards

Variability

Allows statisticians to distinguish between usual and unusual occurrences

New cards

Measures of Spread (variability)

Range - a single value (Max-Min)
IQR - interquartile range (Q3 – Q1)
Standard deviation - σ for population (parameter) & s for sample (statistic) – measures the typical or average deviation of observations from the mean – sample standard deviation is divided by df = n-1
*Sum of the deviations from the mean is always zero!
Variance - standard deviation squared

New cards

Resistant

Not affected by outliers

Median
IQR

New cards

Non-Resistant

Mean
Range
Variance
Standard Deviation
Correlation Coefficient (r)
Least Squares Regression Line (LSRL)
Coefficient of Determination (r²)

New cards

Comparison of mean & median based on graph type

The mean is always pulled in the direction of the skew away from the median

New cards

Symmetrical

Mean and the median are the same value

New cards

Skewed Right

Mean is a larger value than the median

New cards

Skewed Left

The mean is smaller than the median

New cards

Trimmed Mean

Use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers

New cards

The mean is changed by both addition (subtract) & multiplication (division)

New cards

The standard deviation is changed by multiplication (division) ONLY

New cards

Just add or subtract the two (or more) means

New cards

Always add the variances – X & Y MUST be independent

New cards

Z-Score

A standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1.

New cards

Normal Curve

Bell shaped and symmetrical curve.

As σ increases the curve flattens.
As σ decreases the curve thins.

New cards

Empirical Rule

(68-95-99.7) measures 1σ, 2σ, and 3σ on normal curves from a center of μ.

68% of the population is between -1σ and 1σ
95% of the population is between -2σ and 2σ
99.7% of the population is between -3σ and 3σ

New cards

Boxplots

Are for medium or large numerical data. It does not contain original observations.

Always use modified boxplots where the fences are 1.5 IQRs from the ends of the box (Q1 & Q3). Points outside the fence are considered outliers.
Whiskers extend to the smallest & largest observations within the fences.

New cards

5-Number Summary

Minimum
Q1 (1st Quartile – 25th Percentile)
Median
Q3 (3rd Quartile – 75th Percentile)
Maximum

New cards

Sample Space

Collection of all outcomes.

New cards

Event

Any sample of outcomes.

New cards

Complement

All outcomes not in the event.

New cards

Union

A or B, all the outcomes in both circles.

New cards

Intersection

A and B, happening in the middle of A and B.

New cards

Mutually Exclusive (Disjoint)

A and B have no intersection. They cannot happen at the same time.

New cards

Independent

If knowing one event does not change the outcome of another

New cards

Experimental Probability

The number of success from an experiment divided by the total amount from the experiment.

New cards

Law of Large Numbers

As an experiment is repeated the experimental probability gets closer and closer to the true (theoretical) probability. The difference between the two probabilities will approach “0”.

New cards

Probability Rules

All values are 0 < P < 1.
Probability of sample space is 1.
P (at least 1 or more) = 1 – P (none)

New cards

Compliment of a Probability

P + (1 - P) = 1

New cards

Addition of Probabilities

P(A or B) = P(A) + P(B) – P(A & B)

New cards

Multiplication of Probabilities

P(A & B) = P(A) · P(B)

If a & B are independent

New cards

Conditional Probability

Takes into account a certain condition.

New cards

Correlation Coefficient (r)

A quantitative assessment of the strength and direction of a linear relationship. (use ρ (rho) for population parameter)

There is a strength, direction, linear association between x & y.
0 → no correlation

(0, ±0.5) → weak
[±0.5, ±0.8) → moderate
[±0.8, ±1] → strong

New cards

Least Squares Regression Line (LSRL)

A line of mathematical best fit. Minimizes the deviations (residuals) from the line. Used with bivariate data.

x is independent, the explanatory variable & y is dependent, the response variable

New cards

Residuals (error)

Vertical difference of a point from the LSRL. All residuals sum up to “0”.

New cards

Residual Plot

Scatterplot of (x (or ˆy) , residual). No pattern indicates a linear relationship.

New cards

Coefficient of Determination (r²)

Gives the proportion of variation in y (response) that is explained by the relationship of (x, y). Never use the adjusted r².

Approximately r²% of the variation in y can be explained by the LSRL of x any y.

New cards

Slope (b)

For unit increase in x, then the y variable will increase/decrease slope amount.

New cards

Extrapolation

LRSL cannot be used to find values outside of the range of the original data.

New cards

Influential Points

Points that if removed significantly change the LSRL.

New cards

Outliers

Points with large residuals.

New cards

Census

A complete count of the population.

Why not to use a census?
- Expensive
- Impossible to do
- If destructive sampling you get extinction

New cards

Sampling Frame

A list of everyone in the population.

New cards

SRS (Simple Random Sample)

One chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.

Advantages: easy and unbiased.
Disadvantages: large σ² and must know population.

New cards

Stratified

Divide the population into homogeneous groups called strata, then SRS each strata.

Advantages: more precise than an SRS and cost reduced if strata already available.
Disadvantages: difficult to divide into groups, more complex formulas & must know population.

New cards

Systematic

Use a systematic approach (every 50th) after choosing randomly where to begin.

Advantages: unbiased, the sample is evenly distributed across population & don’t need to know population.
Disadvantages: a large σ² and can be confounded by trends.

New cards

Cluster Sample

Based on location. Select a random location and sample ALL at that location.

Advantages: cost is reduced, is unbiased & don’t need to know population.

Disadvantages: May not be representative of population and has complex formulas.

New cards

Random Digit Table

Each entry is equally likely and each digit is independent of the rest.

New cards

Random # Generator

Calculator or computer program

New cards

Bias

Error that favors a certain outcome, has to do with center of sampling distributions – if centered over true parameter then considered unbiased

New cards

Voluntary Response

People choose themselves to participate.

New cards

Convenience Sampling

Ask people who are easy, friendly, or comfortable asking.

New cards

Undercoverage

Some group(s) are left out of the selection process.

New cards