values in a data set added up and divided by the number of included values
2
New cards
median
middle value when data set is placed in ascending order
3
New cards
standard deviation
measure that is used to quantify the amount of set data values
4
New cards
variance
standard deviation squared
5
New cards
quantitative
numerical values - values that can be averaged
6
New cards
qualitative/categorical
values that are generally words, or grouped numbers - cannot “average”
7
New cards
range
the difference between the highest and lowest values in a data set
8
New cards
first quartile
middle value between the minimum and the median (the median of the bottom half)
9
New cards
third quartile
middle value between the median and the maximum (the median of the top half)
10
New cards
Interquartile range (IQR)
the difference between the first and third quartile
11
New cards
outliers
extreme values that are more than 1.5xIQR from the 1st/3rd quartile
12
New cards
resistant
resists the effects of outliers (ie: median, IQR)
13
New cards
nonresistant
influenced by the existence of outliers (ie: mean, standard deviation, range)
14
New cards
boxplot
displays general distribution of data
15
New cards
dotplot
each value represented by a dot - good for specific layout
16
New cards
histogram
displays data grouped into bins of the same width, but displaying the varying frequencies of values
17
New cards
bar chart
similar to histogram, but used for categorical variables - bars don’t touch and x-axis values are not continuous
18
New cards
stem-and-leaf plot
displays all but the last digit of each individual value as a stem, and last digit is the leaf - key must be included
19
New cards
statistical inference
method used to provide ways to answer specific questions from data with some guarantee of success
20
New cards
population
entire group of individuals to which the data is being generalized
21
New cards
sample
part of the group that is being studied
22
New cards
simple random sample
all samples size *n* have the same chance of being selected
23
New cards
probability sample
each member of a sample has a known chance greater zero of being selected
24
New cards
stratified random sample
dividing a population into groups of similar members and then choose a SRS within each smaller group to form the full sample
25
New cards
multistage sample design
process of selecting *t* counties, then *x* townships, *y* blocks in the township, and *z* households
26
New cards
cluster random sample
total population is divided into groups and a sample of the groups is selected
27
New cards
bias
contained in a study that systematically favors certain outcomes
28
New cards
voluntary response sample
sample that consists of people who choose themselves by responding to a general appeal
29
New cards
nonresponse
individual chosen for the sample can’t be contacted or refuses to cooperate
30
New cards
confounding variables
two variables whose effects on a response variable cannot be distinguished from each other
31
New cards
convenience sample
sample made from groups that are easiest to reach
32
New cards
response bias
when a responded lies about sensitive information or telescopes the timing of an event
33
New cards
observational study
data collector visually measures variables of interest, but does not attempt to influence the responses
34
New cards
statistically significant
an observed effect too large to attribute plausibly to chance
35
New cards
experiment
The most effective way to show a relationship between two or more variables
36
New cards
double-blind experiment
experiment where neither the person nor the data collector know the variable being applied to the person
37
New cards
matched pairs
special case of randomized block design used when the experiment has only two treatment conditions
38
New cards
blocking
grouping similar units to allow one to draw more specific, separate conclusions
39
New cards
experimental units
members on which an experiment is done
40
New cards
subjects
members of a group that are human beings
41
New cards
treatment
condition applied to a member or group
42
New cards
factor
different explanatory variables in an experiment
43
New cards
level
specific value of a factor
44
New cards
placebo
dummy treatment that can have effect
45
New cards
control group
group of people receiving a sham treatment
46
New cards
randomization
use of chance to divide experimental units into groups
47
New cards
principles of experimental design
1) control - basis comparison
2) Randomization - fair choice of experimental units/subjects
3) Replication - need to ensure that results continue to tell the same story
48
New cards
hidden bias
occurs when the experimenter does not treat all the subjects the exact same way
49
New cards
Median formula
(*N* + 1 )/2
50
New cards
Standardized Score (z-score)
the number of standard deviations a value is from the mean of its respective data set
51
New cards
normal distribution
bell-shaped curve centered at the mean of a data and distributed approximately as outlined below
52
New cards
types of distribution problems
1. raw value →formula → z-score → normalCDF →percentile 2. percentile → invNorm → z-score → algebra → raw value
53
New cards
symmetric, normal shape
bell-shaped as outlined on other side as well
54
New cards
symmetric, but not normal
mean and median are the same, mode may be different
55
New cards
skewed left
values drag out to the left (smaller numbers)
56
New cards
skewed right
values drag out to the right (larger numbers)
57
New cards
Best descriptive statistics when distribution is symmetric
mean and standard deviation
58
New cards
Best descriptive statistics when distribution is skewed
median and IQR
59
New cards
statistic
value that describes a sample (ie: sample mean, sample standard deviation)
60
New cards
parameter:
value that describes a population (ie: population mean, population standard deviation)
61
New cards
sampling distribution
the sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population
62
New cards
steps to create a sampling distribution
1) Take a larger number of samples from the same population
2) Calculate the p-hat or x-bar for each sample
3) Make a histogram of these values
4) Examine the distribution displayed in the histogram for overall pattern (shape), center, and spread
63
New cards
Bias versus Unbiased
If the sample is collected randomly, the mean of your sample should approach the mean of your population -- this is considered unbiased
64
New cards
variability
as you take the many samples of a sampling distribution, the bigger the sample size of each sample, the closer each sample mean will be to the population mean (bigger sample = less variability)
65
New cards
Central Limit Theorem
The sampling distribution of the means from any population whatsoever (regardless of shape) will be normal provided the sample size of the individual samples is large enough (generally 30+)
66
New cards
Sample means
The mean of the x-bars (sample means) is an unbiased estimator of the population mean
67
New cards
Sampling distribution requirements
1) SRS
2) *n* is greater than or equal to 30 (sample size of each individual sample is *n*)
68
New cards
Sample proportions
mean of the sampling distribution of p-hat is p (therefore p-hat is an unbiased estimator of p)
69
New cards
Sample proportion requirements
1) SRS
2) np & n(1-p) is greater than or equal 10
70
New cards
1 proportion z-test
testing a hypothesis regarding the proportion of a single population -- looking for evidence to reject Ho and statistically support Ha
71
New cards
2 prop. z-test
testing a hypothesis regarding the equivalence of the proportions of two populations -- determining if the evidence shows statistically a difference of higher/lower value between the two proportions
72
New cards
1 & 2 prop z-test: step 1
hypothesis; null and alternative hypothesis, and defining the parameter(s)
73
New cards
1 prop. z-test: step 2
type and conditions
A) one-proportion z-test
B) conditions (1. SRS, 2. success and failures greater than or equal 10)
74
New cards
2 prop z-test: step 2
type and conditions
A) two-proportion z-test
B) conditions (1. SRS, 2. success and failures greater than or equal to 5, 3. fair to believe the two populations are independent of each other)
75
New cards
1 & 2 prop. z-test: step 3
calculations; z-score, p-value
76
New cards
1 & 2 prop. z-test: step 4
conclusion; “based on our evidence \[p-value compared to significance level\], we \[reject/fail to reject\] the null hypothesis, so there \[is/isn’t\] significant evidence to support the alternative hypothesis \[in context\].”
77
New cards
1 prop. z-interval
using sample proportion to estimate a range of values that are likely to contain the population proportion
78
New cards
2 prop. z-interval
using our sample proportions to estimate a range of values that are likely to contain the difference in population proportions
79
New cards
1 & 2 prop. z-interval: step 1
defining in a sentence the population value/ difference in proportions that we are hoping to estimate (ie: “estimate the true proportion”)
80
New cards
1 prop. z-interval: step 2
type and conditions
A) one proportion z-interval
B) conditions (1. SRS, 2. success and failures greater than or equal to 10)
81
New cards
2 prop. z-interval: step 2
type and conditions
A) one proportion z-interval
B) conditions (1. SRS, 2. success and failures greater than or equal to 5, 3. fair to believe the two populations are independent of each other)
82
New cards
1 & 2 prop. z-interval: step 3
calculation; (calculator or formula)
83
New cards
1 & 2 prop. z-interval: step 4
interpretation; “we are __*% confident that our interval (*__*,*_) contains the true proportion/difference in proportions of \[parameter of interest\]
84
New cards
Type I Error
Rejecting Ho when Ho is true
85
New cards
Type II Error
Rejecting Ha when Ha is true
86
New cards
Power
The probability of accurately determining Ha as true
87
New cards
How can you increase power?
* Increase *n* (the best option) * Increase *a* * Move Ho and Ha further apart * Decrease ~~*o*~~
88
New cards
Calculator function: x → z → %
normalcdf
89
New cards
Calculator function: % → z → x
invNorm
90
New cards
1 sample t-test
Testing a hypothesis regarding the mean of a single population -- looking for evidence to reject Ho and statistically support Ha
91
New cards
2 sample t-test
Testing a hypothesis regarding the equivalence of the means of two populations -- determining if the evidence shows statistically a difference or higher/lower value between the two means
92
New cards
1/2 sample t-test: step 1
null and alternative hypothesis, define the parameter
93
New cards
1 sample t-test: step 2
Types and Conditions:
A) 1-sample t-test
B) Conditions (1. SRS, 2. Normality)
94
New cards
2 sample t-test: step 2
Types and conditions:
A) 2-sample t-test
B) Conditions (1. SRS’s, 2. Normality, 3. Independence b/w)
95
New cards
1/2 sample t-test: step 3
Calculations:
* test statistic * degree of freedom * p-value
96
New cards
1/2 sample t-test: step 4
conclusion; “based on our evidence \[p-value compared to significance level\], we \[reject/fail to reject\] the null hypothesis, so there \[is/isn’t\] significant evidence to support the alternative hypothesis \[in context\].”
97
New cards
1 sample t-interval
Using our sample mean to estimate a range of values that are likely to contain the population mean
98
New cards
2 sample t-interval
using our sample means to estimate a range of values that are likely to contain the difference in population means
99
New cards
1/2 sample t-interval: step 1
Defining the parameter we are estimating (“estimate the true mean/difference”)