AP Statistics keyterms

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/138

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

139 Terms

New cards

Statistics

the science of collecting, analyzing, and drawing conclusions form data.

Descriptive - methods of organizing and summarizing statistics

Inferential - making generalizations from a sample to the population

New cards

Population

An entire collection of individuals or objects

New cards

Sample

A subset of the population selected for the study

New cards

Variable

Any characteristic whose value changes

New cards

Data

observations on single or multi-variables

New cards

Variables

categorical, numerical, univariate, bivariate, multivariate

New cards

Categorical (Quallitative)

-basic characteristics

New cards

Numerical (Quantative)

measurements or observations of numerical data.

Discrete- listable sets (counts)

Continuous- any value over an interval of values (measurements)

New cards

Univariate

One variable

New cards

Bivariate

Two variables

New cards

Multivariate

many variables

New cards

Types of distributions

symmetrical, uniform, skewed, bimodal

New cards

Symmetrical

Data on which both sides are fairly the same shape and size. "Bell curve"

New cards

Uniform

Every class has an equal frequency (number) "a rectangle"

New cards

Skewed

one side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right)

New cards

Bimodal

data of two or more classes have large frequencies separated by another class between them. "double hump camel"

New cards

How to describe numerical graphs - S.O.C.S

Shape, Outliers, Center, Spread

New cards

Shape

overall type (symmetrical, skewed right left, uniform or bimodal)

New cards

Statistic (x that type of stuff)

a calculated value about a population from a sample(s).

New cards

Measures of Center

Median, Mean, Mode

New cards

Mean

μ is for a population (parameter) and x is for a sample (statistic)

New cards

Variability

allows statisticians to distinguish between usual and unusual occurrences.

New cards

Resistant

-not affected by outliers

Median and IQR

New cards

Non-resistant

Mean, Range, Variance, Standard Deviation, Correlation Coefficient (r), Least Squares Regression Line (LRSL) and Coefficient of Determination (r^2)

New cards

Trimmed Mean

use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers

New cards

Z-score

is a standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1.

z= x-μ/σ

New cards

5- Number Summary

Minimum, Q1, Median, Q3, Maximum

New cards

Probability rules

Sample Space, Event, Complement, Union, Intersection, Mutually Exclusive, Independent, Experimental Probability, Law of Large Numbers

New cards

Sample Space

is collection of all outcomes

New cards

Event

any sample of outcomes

New cards

Complement

all outcomes not in the event

New cards

Union

A or B, all the outcomes in both circles. AuB

New cards

Intersection

A and B, happening in teh middle of A and B. AnB

New cards

Mutually Exclusive (Disjoint)

A and B have no intersection. They cannot happen at the same time.

New cards

Independent

if knowing one event does not change the outcome of another

New cards

Experimental Probability

is the number of success from an experiment divided by the total amount from the experiment.

New cards

Law of Large Numbers

as an experiment is repeated the experimental probability gets close and closer to the true (theoretical) probability. The difference between the two probabilities will approach "0"

New cards

Correlation Coefficient - (r)

is a quantitative assessment of the strength and direction of a linear relationship.

New cards

Least Squares Regression LIne (LRSL)

is a line of mathematical best fit. Minimizes the deviations (residuals) from teh line. Used with bivariate data.

New cards

Residuals (error)

is a vertical difference of a point from the LRSL. All residuals sum up to "0".

New cards

Residual Plot

a scatterplot of residual. No matter indicates a linear relationship

New cards

Coefficient of Determination (r^2)

gives the proportion of variation in y (response) that is explained by teh relationship of (x,y) Never use the adjusted r^2.

New cards

Interpretations

Slope (b)

For unit increase in x, then the y variable will increase/decrease slope amount

Correlation coefficient (r)

There is a strength, direction, linear association between x and y

Coefficient of determination (r^2)

Approximately r^2% of the variation in y can be explained by the LRSL of x any y.

New cards

Extrapolation

LRSL cannot be used to find values outside of the range of the original data

New cards

Influential Points

are points that if removed significantly change the LSRL.

New cards

Outliers (residuals)

are points with large residuals

New cards

Sampling Frame

is a list of everyone in the population.

New cards

Types of Sampling Designs

SRS, Stratified, Systematic, Cluster Sample

New cards

SRS (Simple Random Sample)

one chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.

Advantage's: easy and unbiased

Disadvantages: large σ2 and must know population

New cards

Stratified

divide the population into homogeneous groups called strata

Advantages: more precise than an SRS and cost reduced if strata already available.

Disadvantages: difficult to divide into groups, more complex formulas & must know population

New cards

Systematic

use a systematic approach (every 50th) after choosing randomly where to begin.

Advantages: unbiased, the sample is evenly distributed across population & don't need to know population

Disadvantages: a large σ2 and can be confounded by trends

New cards

Cluster Sample

based on location. Select a random location and sample ALL at that location

Advantages: cost is reduced, is unbiased and don't need to know population

Disadvantages: May not be representative of population and has complex formulas.

New cards

Random Digit Table

each entry is equally likely and each digit is independent of the rest

New cards

Random # Generator

Calculator or computer program

New cards

Bias-

Error, favors a certain outcome, has to do with center of sampling distributions - if centered over true parameter then considered unbiased

New cards

Sources of Bias

Voluntary Response, Convenience Sampling, Undercoverage, Non-response, Response, Wording of the Questions

New cards

Voluntary Response

People choose themselves to participate

New cards

Convenience Sampling

ask people who are easy, friendly, or comfortable asking

New cards

Undercoverage

some group(s) are left out of the selection process.

New cards

Non-response

someone cannot or does not want to be contacted or participate.

New cards

Response

false answers- can be caused by a variety of things

New cards

Wording of Questions

leading questions

New cards

Types of Experimental Designs

Observational study, experiment, experimental unit, factor, level, response variable, treatment, control group, placebo, blinding, double blinding.

New cards

Observational study

observe outcomes with out giving a treatment

New cards

Experiment

actively imposes a treatment on the subjects

New cards

Experimental unit

single individual or object that receives a treatment

New cards

Factor

Is the explanatory variable, what is being tested.

New cards

Level

a specific value for the factor

New cards

Response Variable

What you are measuring with the experiment

New cards

Treatment

experimental condition applied to each unit

New cards

Control Group

a group used to compare the factor to for effectiveness - does NOT have to be placebo

New cards

Placebo

a treatment with no active ingredients (provides control)

New cards

Blinding

a method used so that the subjects are unaware of the treatment (who gets a placebo or the real treatment).

New cards

Double Blinding

neither the subjects nor the evaluators know which treatment is being given.

New cards

Principles

Control, Replication, Randomization

New cards

Control

Keep all extraneous variables (not being stated) constant

New cards

Replication

uses many subjects to quantify the natural variation in the response

New cards

Randomization

uses chance to assign the subjects to the treatments.

New cards

How to create proper cause and effect

it is with a well designed, well controlled experiment

New cards

Experimental Designs

Completely Randomized, Randomized Block, Matched Pairs, Confounding Variables, Randomization, Blocking

New cards

Completely Randomized

all units are allocated to all the treatments randomly

New cards

Randomized Block

units are blocked and then randomly assigned in each block - reduces variation

New cards

Matched Pairs-

are matched up units by characteristics and then randomly assigned. Once a pair receives a certain treatment, then the other pair automatically receives the second treatment. OR individuals do both treatments in random order (before/ after or pretest/post-test). Assignment in dependent

New cards

Confounding Variables

are where the effect of the variable on the response cannot be separated from teh effects of the factor being tested - happens in observational studies - when you use random assignment to treatments you do NOT have confounding variables.

New cards

Randomization (Designs)

reduces bias by spreading extraneous variables to all groups in the experiment

New cards

Blocking

helps reduce variability. Another was to reduce variability is to increase sample size

New cards

Random variable

a numerical value that depends on teh outcome of an experiment

New cards

Discrete

a count of a random variable

New cards

Continuous

a measure of a random variable

New cards

Discrete Probability Distributions

gives values and probabilities associated with each possible x.

calculator shortcut - 1 VARSTAT L1, L2

New cards

Fair game

a fair game is one in which all pay-ins equal all pay-outs

New cards

Special discrete distributions

binomial distributions and geometric distributions

New cards

Binomial distribution

Properties- two mutually exclusive outcomes, fixed number of trails (n), each trial is independent, the probability (p) of success is the same for all trials.

Random variable- is the number of successes out of the fixed # of trials. Starts at X = 0 and is finite.

μx = np σ = sqrt(npq)

Calculator: binomialpdf (n, p, x) - single outcome P(X=x)

binomialcdf (n, p, x) = cumulative outcome P(X < x)

1 - binomialcdf (n, p, (x-1)) = cumulative outcome P(X>x)

New cards

Geometric Distributions

Properties - two mutually exclusive outcomes, each trial is independent, probability (p) of success is the same for all trials. (NOT a fixed number of trials)

Random Variable - when the FIRST succcess occurs. Starts at 1 and is infinite

Calculator: geometricpdf (p, a) = single outcome P(X = a)

geometriccdf (p, a) = cumulative outcomes P(X < a)

1 - geometriccdf (n, p, (a-1)) = cumulative outcome P(X > a)

New cards

Continuous Random Variable

numerical values that fall within a range of interval (measurements), use density curves where the area under the curve always = 1. The find probabilities, find area under the curve

Unusual Density Curves - any shape (triangles, etc.)

Uniform Distributions - uniformly (evenly) distributed, shape of a rectangle.

Normal Distributions - symmetircal, unimodal, bell shaped curves defined by the parameters μ and σ.

Calculator: Normalpdf - used for graphing only

Normalcdf (lower bound, upper bound, μ, σ) - finds probability

InvNorm(p) - z-score OR InvNorm (p, μ, σ) - gives x-value

New cards

To assess Normality

Use Graphs - dotplots, boxplots, histograms, or normal probability plot.

New cards

Distribution

is all of the values of a random variable

New cards

Sampling Distribution

of a statistic is the distribution of all possible values of all possible samples. Use normalcdf to calculate probabilities - be sure to use correct SD

New cards

Standard error

estimate of the standard deviation of the statistic

100

New cards

Central Limit Theorem

when n is sufficiently large (n>30) the sampling distribution is approximately normal even if the population distribution is not normal