1/92
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
descriptive statistics
the process of describing/summarizing sample/population data collected
inferential statistics
the process of making predictions about population parameters using the data
population
the entire group of interest in regards to a statistical study
sample
a section of the entire group of interest of study
variables
characteristics that vary among subjects
quantitative
numerical values that represent different magnitudes of the variable
qualitative
values that are categorical without a specific order or magnitude
nominal data
measurements that are categorical/qualitative and unordered (no category is greater than or smaller than any other)
ordinal data
measurements that are categorical/qualitative and ordered (however, there is no particular defined distance between the levels of data)
discrete variables
values that form a set of separate numbers
continuous variables
values that can form an infinite continuum of possible real number values
n
sample size
simple random sample
method for creating a sample population in which each possible sample within that population has the same probability of being selected
sampling frame
list of all subjects in a population
random numbers
numbers generated by a computer to facilitate the selection of random samples in SRS
sample survey
method of collecting data involving supplying samples with questions
experiment
data gathered from the process of systematically changing a/certain condition(s) and measuring the output
treatments
different conditions within an experiment
observational studies
study of a population/sample without any manipulation of conditions
sampling error
how much the statistic differs from the parameter it predicts because samples are variable from the larger population
sampling bias
a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability and may thus have adverse effects on the data collected
nonprobability sampling
methods for which it is not possible to determine the probabilities of the possible samples (i.e. volunteer sampling)
selection bias
bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed
undercoverage
when the sample selected for a study lacks representation from some groups in the population
response bias
wide range of tendencies for participants to respond inaccurately or falsely to questions
nonresponse bias
inability to gather data from certain subjects within a sample, either because of their refusal to participate or they are unreachable
systematic random sampling
process of selecting subjects by choosing a subject at random from the first nth name within a sampling frame and then selecting every nth subject listed after that one.
stratified random sampling
process of selecting subjects by dividing the population into separate groups called strata and then selecting a simple random sample from each stratum
proportional stratified random sampling
occurs when the sampled strate proprtions are the same as those in the entire population
disproportional stratified random sampling
occurs when the sampled strate proportions differ from the population proportions
cluster sampling
process for selecting subjects in which the population is divided into large number of clusters and a simple random sample is selected from amongst the clusters
multistage sampling
process for selecting samples by which mulitple sampling methods are utilized
frequency distribution
list of possible values for a variable with # of observations for that variable
relative frequency distribution
frequency distribution but with percentages/proportions
histogram
frequency distribution for quantitative variables segmented by intervals
stem-and-leaf plots
observations presented with their leading digit (stem) and final digit (leaf)
population distribution
frequency distributions for populations
sample data distribution
frequency distribution for samples
symmetrical distribution types
U-shaped and bell-shaped
skewed distributions
when the extreme ends of data frequencies form “tails” that elongate the shape
mean (average)
sum of the observations divided by the # of observations
y-bar
sample mean
properties of the mean
(1) highly influenced by outliers
(2) influenced by skewed distributions
(3) the “point of balance” on a number line when an equal weight is at each observation point
weighted average
where two sets of data with sample sizes n1 and n2 with sample means y1 and y2 are combined: (n1y1 + n2y2) / (n1 + n2)
median
the observation that falls in the middle of the ordered sample
properties of the median
(1) valid for quantitative and ordinal data
(2) for symmetric distributions, median and mean are the same
(3) in skewed distributions, it lies less farther out along the tail than the mean
(4) insensitive to the distances of the observations from the middle (only uses order of the data)
(5) not affected by outliers
best case use for mean
highly discrete data; distribution is close to symmetric or only mildly skewed
best case use for median
highly skewed data
mode
the values that occurs most frequently
bimodal distribution
when two distinct clusters of data occur within a data distribution
range
the difference between the largest and smallest observations within a data set
standard deviation
a measure of the amount of variation of the values of a variable about its mean; found by computing the squared sum of squared deviations from the y-bar mean and dividing by the sample size n—1; denoted by s
variance
standard deviation squared; denoted by s-squared
sum of squares
the sum of all calculated deviations squared; the larger the deviations, the larger the sum of squares and the larger s tends to be
properties of standard deviation
(1) s always greater than or equal to 0
(2) s = 0 when all the observations have the same value
(3) the greater the variability about the mean, the larger is the value of s
(4) standard deviation can be “rescaled” by multiples
percentiles
indicate the percentage of observations that fall below or at that point; the percentage of data falling above it = (100 - p)%
lower quartile
25th percentile
quantile
50th percentile
upper quartile
75th percentile
IQR (interquartile range)
different between the upper and lower quartiles; describes the spread of the middle half of the observations (increases as variability increases); not as sensitive as standard deviation is to outliers; for bell-shaped distributions, the IQR is approximately (4/3)s
boxplot
graph display that captures center (median) and variability (quartiles); extends to minimum and maximum but does not encapsulate outliers
outlier
falls more than 1.5(IQR) above the upper quartile and more than 1.5 IQR below the lower quartile
z-score
the # of standard deviations that an observation falls from the mean
association
exists between two variables if certain values of one variable tend to go with certain values of the other
bivariate analysis
an analysis of association between two variables (usually explanatory and response variables)
explanatory variable
the variable that defines groups (independent variable)
response variable
the outcome variable (dependent variable)
contingency table
displays the # of subjects observed at different combinations of possible outcomes for the two variables; illustrates contingency between explanatory variable and outcome
scatterplot
graph that plots data between bivariate quantitative variables using one dot to represent one occurence of that outcome
correlation
describes the strength of association between variables in terms of how closely the data follows a straight-line trend
regression analysis
analysis method that provides a straight-line formula for predicing the value of y given a value of x
μ
population mean; average of the observations for the entire population
σ
population standard deviation; describes the variability of those observations about the population mean
probability
the proportion of times that the outcome would occur in a very long sequence of observations
probability of A not occurring
P (not A) = 1-P(A)
probability of A or B
P (A or B) = P(A) + P(B)
probability of A and B
P (A and B) = P(A) x P(B given A); contains a conditional probability
probability of A and B (independent)
P(A and B) = P(A) x P(B)
histogram
graphic display for probability distribution where the probability of a value is represented by the height of a bar
mean of a probability distribution (discrete)
sum of total observations times their probability of occurence
E(y)
expected value of y; also known as the mean of a probability distribution
normal probability distribution
symmetric, bell-shaped, and characterized by its mean (μ) and standard deviation (σ); 0.68 of observations fall within 1 standard deviation, 0.95 within 2 SDs, and 0.997 within 3 SDs
Empirical Rule
for bell-shaped histograms, about 68% of the data fall within 1 SD of the mean, 95% falls within 2 SDs of the mean, and 99.7% of data falls within 3 SDs of the mean
z-score
represents the # of SDs that observed value y falls from the mean
standard normal distribution
a normal distribution with the mean μ = 0 and the SD σ = 1
covariance
represents the average of the cross products about the population means between bivariate variable distributions (joint probabilities for pairs of random variables)
sampling distribution of a statistic
the probability distribution that specifies probabilities for the possible values that statistic can take (i.e. sample proportion or sample mean)
sample mean y in relation to population mean μ
fluctuates (sample mean y varies in value from sample to sample)
standard error
the standard deviation of the sampling distribution y; describes how sample mean y varies from sample to sample; denoted by σ(y)
as n increases, the standard error
decreases
as n increases, the sampling distribution gets
narrower (sample proportion falls closer to the population proportion; less probability of getting any other wild answer)
Central Limit Theorem
for random sampling with a large size n, the sampling distribution of the sample mean y is approximately a normal distribution; for most cases, n of 30 is sufficient
implication of Central Limit Theorem
the bell shape of the sampling distribution applies no matter the same of the population distribution