AP Test Review - Full Year Vocabulary

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/111

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

112 Terms

1
New cards

Inference conditions for means

Random: random sample or randomized experiment

10% condition: Population > 10n (if sampling w/o replacement)

Large Counts:

1. Population distribution is Normal (given in problem)

2. Graphs show no skew or outliers and we can assume the data came from an approximately Normal population distribution.

3. n>30 (CLT says the sampling distribution of sample means will be approx Normal)

2
New cards

sampling distribution

the distribution of values taken by the statistic in all possible samples of the same size from the same population.

3
New cards

distribution of a sample

The distribution of values of the variable for the individuals included in a sample

4
New cards

population distribution

The distribution of values of the variable for all individuals in the population.

5
New cards

Catergorical data

data falls into groups or categories. Ex: Favorite car, hair color, etc.

6
New cards

Display categorical data

bar chart, pie chart

7
New cards

Quantitative Data

Data that takes on numerical values (makes sense to average) Ex: height, weight, time, etc.

8
New cards

Display Quantitative Data

Histogram, Dotplot, Boxplot, Stemplot

9
New cards

Marginal Distribution

in a contingency table, the distribution of ONE of the variables alone. (Always out of the total sample)

10
New cards

conditional distribution

describes the values of that variable among individuals who have a specific value of another variable. (Always out of a subgroup of the total sample)

11
New cards

SOCS

shape, outliers, center, spread AND add context!

12
New cards

Comparing Distributions

Address: Shape, Outliers, Center, Spread

in context!

YOU MUST USE comparison phrases like "is greater

than" or "is less than" for Center & Spread

13
New cards

Outlier Rule

Upper Bound = Q3 + 1.5(IQR)

Lower Bound = Q1 - 1.5(IQR)

14
New cards

Interpret Standard Deviation

Measures spread by giving the "typical" or "average" distance that the observations (context) are away from their (context) mean

15
New cards

How does shape affect measures of center?

In general,

Skewed Left (Mean < Median)

Skewed Right (Mean > Median)

Fairly Symmetric (Mean ≈ Median)

16
New cards

Percentile

The percentage of data points that lie at or below the value of interest.

17
New cards

z-score formula

z = (x - μ)/σ

18
New cards

z-score interpretation

The z-score represents the number of standard deviations above/below the mean that a particular point lies within a distribution.

19
New cards

linear transformation

when you multiply, divide, add, or subtract a constant from each score in a distribution.

Shape - Stays the same!

Center - Impacted by all operations (+, -, x, /)

Spread - Impacted by only x & /)

20
New cards

standard normal distribution

A normal distribution of z-scores with a mean of 0 and a standard deviation of 1.

21
New cards

normalcdf

Calculator command to find the area under a normal curve, given the following values:

normalcdf(lower bound, upper bound, mean, standard deviation)

22
New cards

invNorm

Calculstor command to find the value corresponding to a given area to the left of that value, given the following values:

invNorm(area to the left, mean, standard deviation)

23
New cards

SRS (Simple Random Sample)

all individuals in population have equal chance of being selected, and every group has equal chance of being selected

24
New cards

Sampling Techniques

1. SRS

2. Stratified

3. Cluster

4. Census

5. Convenience

6. Voluntary Response

7. Systematic

25
New cards

stratified random sample

a sampling design in which the population is divided into several subpopulations, and random samples are then drawn from each stratum. (SOME from ALL)

26
New cards

Cluster Random Sample

Divide the population into a large number of clusters. Randomly select a certain number of clusters and sample ALL subjects in each cluster. (ALL from SOME)

27
New cards

Census

An attempt to contact/sample all members of the population.

28
New cards

convenience sample

only members of the population who are easily accessible are selected

29
New cards

voluntary response sample

People decide whether to join a sample based on an open invitation (Ex: on-line polls, telephone calls, etc.)

30
New cards

Advantage of stratified random sampling

Stratified random sampling guarantees that each of the strata will be represented. It will produce less variable/more precise information than an SRS of the same size.

31
New cards

Bias

A sampling method is bias if it consistently produces estimates that are too small or too large.

32
New cards

experiment

A research method in which an investigator imposes a treatment upon the experimental units.

33
New cards

observational study

observes individuals and measures variables of interest but does not attempt to influence the responses or impose a treatment.

34
New cards

Experiment vs. Observational Study

An experiment can conclude a cause-and-effect relationship between explanatory and response variables.

An observational study can only conclude an association between explanatory and response variables.

35
New cards

confounding variable

two variables are confounded if it cannot be determined which variable is causing the change in the response variable.

36
New cards

control group

the group that does not receive the experimental treatment. An experiment DOES need comparison, but DOES NOT need a control group in order to compare.

37
New cards

Blinding

a technique where the subjects do not know whether they are receiving a treatment or a placebo.

If both the subject and the people interacting with the subject don't know which treatment is being received/given, then the study is double blind.

38
New cards

Experimental Designs

1. Completely Randomized Design.

2. Randomized Block Design.

3. Matched Pairs

39
New cards

completely randomized design

all experimental units have an equal chance of receiving any treatment

40
New cards

randomized block design

Start by forming blocks consisting of individuals that are similar in some way that is important to the response. Random assignment of treatments is then carried out separately within each block.

41
New cards

matched pairs design

The design of a study where experimental units are naturally paired by a common characteristic, or with themselves in a before-after type of study.

42
New cards

Benefit of Blocking

Blocking helps account for the variability in the response variable (context) that is caused by the blocking variable (context).

43
New cards

scope of inference

1. We can generalize our inference to the entire population when individuals are randomly selected.

2. Inferences about cause and effect are possible when an experiment is performed.

44
New cards

interpret probability

the probability of an event is the proportion of times the event would occur in a very large number of repetitions. (Probability is a long-term relative frequency.)

45
New cards

Law of Large Numbers

if we observe more and more repetitions of any chance process, the proportion of times that a specific outcome occurs approaches a single value.

46
New cards

Conducting a simulation

State: Ask a question about some chance process.

Plan: Describe how to use a random device to simulate one trial of the process and indicate what will be recorded at the end of each trial.

Do: Do many trials.

Conclude: Answer the question of interest.

47
New cards

Two Events are Independent If...

P(A)*P(B) = P(A and B)

OR

P(B) = P(B|A)

Or

P(B) = P(B|A)

Meaning: Knowing that Event A has occurred (or not occurred) doesn't change the probability that event B occurs.

48
New cards

Two events are mutually exclusive if

P(A and B) = 0

Events A and B are mutually exclusive if they share no outcomes.

49
New cards

Interpreting Expected Value/Mean

If we were to repeat the chance process (context) many times, the average value of _____ (context) would be about _______.

50
New cards

Mean of a Discrete Random Variable (expected value)

multiply each possible value by its probability, then add all the products

51
New cards

combining random variables: finding mean

add/subtract the means for each independent distribution.

52
New cards

combining random variables: finding standard deviation

ADD the VARIANCE for each independent distribution. Then square root your final answer.

53
New cards

binomial setting

BINS:

1. Binary: everything is a success or failure.

2. Independent trials.

3. A fixed Number of observations

4. The probability of Success is the same for each observation.

54
New cards

geometric setting

1. Binary: everything is a success or failure.

2. Independent trials.

3. Observe trials UNTIL a success.

4. The probability of Success is the same for each observation.

55
New cards

Mean and Standard Deviation of a Binomial Random Variable

Mean = np

Standard deviation = Sqroot[np(1-p)]

56
New cards

Parameter vs. Statistic

Parameter: a measure (mean/proportion etc.) of a POPULATION

Statistic: a measure (mean/proportion etc.) of SAMPLE

57
New cards

sampling distribution

the distribution of values taken by the statistic in all possible samples of the same size from the same population. Does NOT vary in repeated sampling.

58
New cards

distribution of a sample

The distribution of values of the variable for the individuals included in ONE sample. This distribution varies in repeated sampling.

59
New cards

population distribution

a description of how individuals are distributed with respect to one another. Does NOT vary in repeated sampling.

60
New cards

sampling distribution of the sample mean

A probability distribution of all possible sample means of a given sample size. Shape, center and spread can be found by satisfying the following conditions:

10% condition - Find mean/standard deviation (Formula Sheet)

N > 10n

Large Counts - determines if sampling distribution is approximately Normal.

n > 30 OR population distribution is Normal

61
New cards

sampling distribution of sample proportions

A probability distribution of all possible sample proportions of a given sample size. Shape, center and spread can be found by satisfying the following conditions:

10% condition - Find mean/standard deviation (Formula Sheet)

N > 10n

Large Counts - determines if sampling distribution is approximately Normal.

np > 10 AND n(1-p) > 10

62
New cards

Central Limit Theorem (CLT)

Says that when n is large (n > 30), the sampling distribution of the sample mean is approximately Normal

63
New cards

unbiased estimator

A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated. (means and proportions are unbiased estimators).

64
New cards

correlation coefficient

A number that describes the strength and direction of a linear relationship. (from -1 to +1)

65
New cards

explanatory variable

A variable that helps explain or influences changes in a response variable.

66
New cards

response variable

a variable that measures an outcome or result of a study

67
New cards

DUFS

direction, unusual features, form, strength

68
New cards

influential point

An extreme value whose removal would drastically change the LSRL, correlation and/or coefficient of determination

69
New cards

linear regression

A method of finding the best model for a linear relationship between the explanatory and response variable.

70
New cards

negative association

as x increases, y decreases

71
New cards

positive association

as x increases, y increases

72
New cards

correlation coefficient = 0

no LINEAR association

73
New cards

Outlier

A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

74
New cards

Scatterplot

a graphical depiction of the relationship between two quantitative variables

75
New cards

coefficient of determination

The percent of the variation in the values of y that can be explain by the least-squares regression line of y on x.

76
New cards

lurking variable

a variable that is not among the explanatory or response variables in a study but that may influence the response variable

77
New cards

extrapolation

Using a model to make a prediction outside the range of data used to create the model in the first place.

78
New cards

slope

the change in the response variable (y) for every one unit of change to the explanatory variable (x)

79
New cards

y-intercept

the value of the response variable (y) when the explanatory variable (x) is 0.

80
New cards

Equation of a line

y = a + bx

81
New cards

LSRL

a unique best-fit line that is found by making the squares of the residuals as small as possible

82
New cards

y hat

predicted value of y

83
New cards

residual

prediction error

84
New cards

Actual Value - Predicted Value

Formula for Residual

85
New cards

residual plot

a scatterplot of the regression residuals against the explanatory variable

86
New cards

standard deviation of residuals

This value gives the approximate size of a "typical" or "average" prediction error (residual).

87
New cards

Parameter

measures a characteristic of a POPULATION (mean, proportion, etc.)

88
New cards

Statistic

measures a characteristic of a SAMPLE (mean, proportion, etc.)

89
New cards

Central Limit Theorem (CLT)

Says that when n is large, the sampling distribution of the sample mean is approximately Normal

90
New cards

unbiased estimator

A statistic used to estimate a parameter is an unbiased estimator if the mean of its sampling distribution is equal to the true value of the parameter being estimated.

91
New cards

4-Step process to inference procedures (confidence intervals and significance tests)

State, Plan, Do & Conclude

92
New cards

Interpret Confidence Interval

we are ___% confident that the interval from ___ to ___ captures the actual value of the [population parameter in context]

93
New cards

Interpret Confidence Level

If we take many samples from a population about ___% of them will result in an interval that captures the parameter (in context)

94
New cards

Standard Error vs Margin of Error

The standard error of a statistic estimates how far the value of the statistic typically differs from the true value of the parameter. (calculating standard deviation from sample data)

The margin of error estimates how far we expect the parameter to differ from the statistic, at most. (the +/- on our confidence interval)

95
New cards

What factors effect the Margin of Error

The margin of error decreases when:

1. The sample size increases

2. The confidence level decreases

96
New cards

Finding the Sample Size (For a given margin of error)

Means - Use z* and assume we know the population SD

Proportions - Use given "p-hat" or if unknown, use p = 0.5

97
New cards

inference conditions for proportions

Random: random sample or randomized experiment

10% condition: Population > 10n (if sampling w/o replacement)

Large Counts: np>10 and n(1-p)>10

98
New cards

Interpret P-Value

The probability, if the null is true, that we would get a statistic as extreme or more by chance (IN CONTEXT)

99
New cards

Assess a claim from a confidence interval

**ONLY works for two-sided tests

1. If the null hypothesis is in the interval, then it is a plausible value and should NOT be rejected.

2. If the null hypothesis is NOT in the interval than it is not a plausible value and should be rejected.

100
New cards

Type I error

Rejecting null hypothesis when you shouldn't

(Accept the alternative hypothesis when it is NOT true)

P(Type I Error) = Alpha (significance level)