AP Statistics keyterms

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/138

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

139 Terms

1
New cards

Statistics

the science of collecting, analyzing, and drawing conclusions form data.

Descriptive - methods of organizing and summarizing statistics

Inferential - making generalizations from a sample to the population

2
New cards

Population

An entire collection of individuals or objects

3
New cards

Sample

A subset of the population selected for the study

4
New cards

Variable

Any characteristic whose value changes

5
New cards

Data

observations on single or multi-variables

6
New cards

Variables

categorical, numerical, univariate, bivariate, multivariate

7
New cards

Categorical (Quallitative)

-basic characteristics

8
New cards

Numerical (Quantative)

measurements or observations of numerical data.

Discrete- listable sets (counts)

Continuous- any value over an interval of values (measurements)

9
New cards

Univariate

One variable

10
New cards

Bivariate

Two variables

11
New cards

Multivariate

many variables

12
New cards

Types of distributions

symmetrical, uniform, skewed, bimodal

13
New cards

Symmetrical

Data on which both sides are fairly the same shape and size. "Bell curve"

14
New cards

Uniform

Every class has an equal frequency (number) "a rectangle"

15
New cards

Skewed

one side (tail) is longer than the other side. The skewness is in the direction that the tail points (left or right)

16
New cards

Bimodal

data of two or more classes have large frequencies separated by another class between them. "double hump camel"

17
New cards

How to describe numerical graphs - S.O.C.S

Shape, Outliers, Center, Spread

18
New cards

Shape

overall type (symmetrical, skewed right left, uniform or bimodal)

19
New cards

Statistic (x that type of stuff)

a calculated value about a population from a sample(s).

20
New cards

Measures of Center

Median, Mean, Mode

21
New cards

Mean

μ is for a population (parameter) and x is for a sample (statistic)

22
New cards

Variability

allows statisticians to distinguish between usual and unusual occurrences.

23
New cards

Resistant

-not affected by outliers

Median and IQR

24
New cards

Non-resistant

Mean, Range, Variance, Standard Deviation, Correlation Coefficient (r), Least Squares Regression Line (LRSL) and Coefficient of Determination (r^2)

25
New cards

Trimmed Mean

use a % to take observations away from the top and bottom of the ordered data. This possibly eliminates outliers

26
New cards

Z-score

is a standardized score. This tells you how many standard deviations from the mean an observation is. It creates a standard normal curve consisting of z-scores with a μ = 0 & σ = 1.

z= x-μ/σ

27
New cards

5- Number Summary

Minimum, Q1, Median, Q3, Maximum

28
New cards

Probability rules

Sample Space, Event, Complement, Union, Intersection, Mutually Exclusive, Independent, Experimental Probability, Law of Large Numbers

29
New cards

Sample Space

is collection of all outcomes

30
New cards

Event

any sample of outcomes

31
New cards

Complement

all outcomes not in the event

32
New cards

Union

A or B, all the outcomes in both circles. AuB

33
New cards

Intersection

A and B, happening in teh middle of A and B. AnB

34
New cards

Mutually Exclusive (Disjoint)

A and B have no intersection. They cannot happen at the same time.

35
New cards

Independent

if knowing one event does not change the outcome of another

36
New cards

Experimental Probability

is the number of success from an experiment divided by the total amount from the experiment.

37
New cards

Law of Large Numbers

as an experiment is repeated the experimental probability gets close and closer to the true (theoretical) probability. The difference between the two probabilities will approach "0"

38
New cards

Correlation Coefficient - (r)

is a quantitative assessment of the strength and direction of a linear relationship.

39
New cards

Least Squares Regression LIne (LRSL)

is a line of mathematical best fit. Minimizes the deviations (residuals) from teh line. Used with bivariate data.

40
New cards

Residuals (error)

is a vertical difference of a point from the LRSL. All residuals sum up to "0".

41
New cards

Residual Plot

a scatterplot of residual. No matter indicates a linear relationship

42
New cards

Coefficient of Determination (r^2)

gives the proportion of variation in y (response) that is explained by teh relationship of (x,y) Never use the adjusted r^2.

43
New cards

Interpretations

Slope (b)

For unit increase in x, then the y variable will increase/decrease slope amount

Correlation coefficient (r)

There is a strength, direction, linear association between x and y

Coefficient of determination (r^2)

Approximately r^2% of the variation in y can be explained by the LRSL of x any y.

44
New cards

Extrapolation

LRSL cannot be used to find values outside of the range of the original data

45
New cards

Influential Points

are points that if removed significantly change the LSRL.

46
New cards

Outliers (residuals)

are points with large residuals

47
New cards

Sampling Frame

is a list of everyone in the population.

48
New cards

Types of Sampling Designs

SRS, Stratified, Systematic, Cluster Sample

49
New cards

SRS (Simple Random Sample)

one chooses so that each unit has an equal chance and every set of units has an equal chance of being selected.

Advantage's: easy and unbiased

Disadvantages: large σ2 and must know population

50
New cards

Stratified

divide the population into homogeneous groups called strata

Advantages: more precise than an SRS and cost reduced if strata already available.

Disadvantages: difficult to divide into groups, more complex formulas & must know population

51
New cards

Systematic

use a systematic approach (every 50th) after choosing randomly where to begin.

Advantages: unbiased, the sample is evenly distributed across population & don't need to know population

Disadvantages: a large σ2 and can be confounded by trends

52
New cards

Cluster Sample

based on location. Select a random location and sample ALL at that location

Advantages: cost is reduced, is unbiased and don't need to know population

Disadvantages: May not be representative of population and has complex formulas.

53
New cards

Random Digit Table

each entry is equally likely and each digit is independent of the rest

54
New cards

Random # Generator

Calculator or computer program

55
New cards

Bias-

Error, favors a certain outcome, has to do with center of sampling distributions - if centered over true parameter then considered unbiased

56
New cards

Sources of Bias

Voluntary Response, Convenience Sampling, Undercoverage, Non-response, Response, Wording of the Questions

57
New cards

Voluntary Response

People choose themselves to participate

58
New cards

Convenience Sampling

ask people who are easy, friendly, or comfortable asking

59
New cards

Undercoverage

some group(s) are left out of the selection process.

60
New cards

Non-response

someone cannot or does not want to be contacted or participate.

61
New cards

Response

false answers- can be caused by a variety of things

62
New cards

Wording of Questions

leading questions

63
New cards

Types of Experimental Designs

Observational study, experiment, experimental unit, factor, level, response variable, treatment, control group, placebo, blinding, double blinding.

64
New cards

Observational study

observe outcomes with out giving a treatment

65
New cards

Experiment

actively imposes a treatment on the subjects

66
New cards

Experimental unit

single individual or object that receives a treatment

67
New cards

Factor

Is the explanatory variable, what is being tested.

68
New cards

Level

a specific value for the factor

69
New cards

Response Variable

What you are measuring with the experiment

70
New cards

Treatment

experimental condition applied to each unit

71
New cards

Control Group

a group used to compare the factor to for effectiveness - does NOT have to be placebo

72
New cards

Placebo

a treatment with no active ingredients (provides control)

73
New cards

Blinding

a method used so that the subjects are unaware of the treatment (who gets a placebo or the real treatment).

74
New cards

Double Blinding

neither the subjects nor the evaluators know which treatment is being given.

75
New cards

Principles

Control, Replication, Randomization

76
New cards

Control

Keep all extraneous variables (not being stated) constant

77
New cards

Replication

uses many subjects to quantify the natural variation in the response

78
New cards

Randomization

uses chance to assign the subjects to the treatments.

79
New cards

How to create proper cause and effect

it is with a well designed, well controlled experiment

80
New cards

Experimental Designs

Completely Randomized, Randomized Block, Matched Pairs, Confounding Variables, Randomization, Blocking

81
New cards

Completely Randomized

all units are allocated to all the treatments randomly

82
New cards

Randomized Block

units are blocked and then randomly assigned in each block - reduces variation

83
New cards

Matched Pairs-

are matched up units by characteristics and then randomly assigned. Once a pair receives a certain treatment, then the other pair automatically receives the second treatment. OR individuals do both treatments in random order (before/ after or pretest/post-test). Assignment in dependent

84
New cards

Confounding Variables

are where the effect of the variable on the response cannot be separated from teh effects of the factor being tested - happens in observational studies - when you use random assignment to treatments you do NOT have confounding variables.

85
New cards

Randomization (Designs)

reduces bias by spreading extraneous variables to all groups in the experiment

86
New cards

Blocking

helps reduce variability. Another was to reduce variability is to increase sample size

87
New cards

Random variable

a numerical value that depends on teh outcome of an experiment

88
New cards

Discrete

a count of a random variable

89
New cards

Continuous

a measure of a random variable

90
New cards

Discrete Probability Distributions

gives values and probabilities associated with each possible x.

calculator shortcut - 1 VARSTAT L1, L2

91
New cards

Fair game

a fair game is one in which all pay-ins equal all pay-outs

92
New cards

Special discrete distributions

binomial distributions and geometric distributions

93
New cards

Binomial distribution

Properties- two mutually exclusive outcomes, fixed number of trails (n), each trial is independent, the probability (p) of success is the same for all trials.

Random variable- is the number of successes out of the fixed # of trials. Starts at X = 0 and is finite.

μx = np σ = sqrt(npq)

Calculator: binomialpdf (n, p, x) - single outcome P(X=x)

binomialcdf (n, p, x) = cumulative outcome P(X < x)

1 - binomialcdf (n, p, (x-1)) = cumulative outcome P(X>x)

94
New cards

Geometric Distributions

Properties - two mutually exclusive outcomes, each trial is independent, probability (p) of success is the same for all trials. (NOT a fixed number of trials)

Random Variable - when the FIRST succcess occurs. Starts at 1 and is infinite

Calculator: geometricpdf (p, a) = single outcome P(X = a)

geometriccdf (p, a) = cumulative outcomes P(X < a)

1 - geometriccdf (n, p, (a-1)) = cumulative outcome P(X > a)

95
New cards

Continuous Random Variable

numerical values that fall within a range of interval (measurements), use density curves where the area under the curve always = 1. The find probabilities, find area under the curve

Unusual Density Curves - any shape (triangles, etc.)

Uniform Distributions - uniformly (evenly) distributed, shape of a rectangle.

Normal Distributions - symmetircal, unimodal, bell shaped curves defined by the parameters μ and σ.

Calculator: Normalpdf - used for graphing only

Normalcdf (lower bound, upper bound, μ, σ) - finds probability

InvNorm(p) - z-score OR InvNorm (p, μ, σ) - gives x-value

96
New cards

To assess Normality

Use Graphs - dotplots, boxplots, histograms, or normal probability plot.

97
New cards

Distribution

is all of the values of a random variable

98
New cards

Sampling Distribution

of a statistic is the distribution of all possible values of all possible samples. Use normalcdf to calculate probabilities - be sure to use correct SD

99
New cards

Standard error

estimate of the standard deviation of the statistic

100
New cards

Central Limit Theorem

when n is sufficiently large (n>30) the sampling distribution is approximately normal even if the population distribution is not normal