1/80
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
variable
holds information about the same characteristic for many subjects
categorical variable
where the data collected places the individuals in various categories or groups
quantitative variable
where the data collected is numerical and it makes sense to use it for numerical operations
frequency table
lists the categories for a categorical variable and displays the counts for each category
relative frequency table
lists the categories for a categorical variable and displays the percenatges for each category
distribution
describes how a quantitative variable behaves. Generally include shape, center, spread, & unusual features.
bar chart
a display for categorical data that uses bar height to represent counts or percentages for each category
histogram
a display for quantitative data that uses adjacent bars to represent counts or percentages of values falling in each interval
stemplot
a display for quantitative data that uses place values to reprensent the distributions
dotplot
a display for either kind of data that uses a dot to represent each individual in the data set
measures of center
mean for distributions that are symmetric, median for all other distribution shapes
measures of spread
standard deviation for distributions that are symmetric, IQR for all other distribution shapes
uniform
a distribution whose shape is evenly distributed throughout the values it takes
symmetric
a distribution whose shape is unimodal and each side is roughly a mirror image of the other
left skewed
a distribution that has a concentration of data on the upper end and the tail on the left
right skewed
a distribution with a concentration of data on the lower end and the tail on the right
outliers
values that fall outside the overall pattern of the data
mean
the average of the data values
median
the value in the center of an ordered data set
range
the maximum data value minus the minimum data value
1st quartile
the value where 25 % of the data fall below it in an ordered list
3rd quartile
the value where 75% of the data falls below it in an ordered list
IQR
the third quartile minus the first quartile
percentile
the place in the data where a certain percentage of the data falls below that value
5 number summary
includes the minimum, first quartile, median, third quartile, & the maximum
modified boxplot
a display for quantitative data that graphs the five-number summary on an axis and shows outliers of they exist
variance
the standard deviation squared, it is a measure of spread
resistant
values that are not strongly affected by extreme values, the median is more resistant that the mean. The standard deviation is most strongly affected by extreme values
Conditional Distribution
Deals with the rows inside the table
Pie Graph
used to show parts of a whole
Segmented Bar Graph
used to compare the distribution of a categorical variable in each of several groups
Box and whisker plot
shows the variability of a data set using quartiles
Measure of center
Mean - Is not resistant to extreme values
Median - Is resistant to extreme values
Measures of Spread
Range - not resistant to extremes
Standard Deviation - Not resistant to extremes
IQR - Is resistant to extremes
SOCS
S - Shape
O - Outliers
C - Center
S - Spread
Standard Deviation
a computed measure of how much scores vary around the mean score
percentile
value with the same % of the observations at or less than it
cumulative relative frequency graph (ogive)
graph used to examine location with a distribution, grouping observations into equal width classes, shows accumulating % of observations as you move through the class in increasing order
z-score
a measure of how many standard deviations from the mean an observation falls, & in what direction
transformations
converting the original observations to another scale, can affect shape, center, & spread of a distribution
density curve
a curve that is always on or above the horizontal axis, & has area exactly 1 underneath it, describes the overall pattern of the distribution
mean of a density curve
the balance point which the curve would balance if made of solid material
median of a density curve
the equal-areas point which divides the area under the curve in half
µ
notation for the mean of a density curve
σ
notation for the standard deviation of a density curve
normal curve
symmetric, single-peaked, & bell-shaped
68-95-99.7 rule
in a normal distribution, 68% of values fall within 1σ of the mean, 95% fall within 2σ of the mean, & 99.7% fall within 3σ of the mean
standard normal distribution
has a mean of 0, & a standard deviation of 1
standard normal table (table A)
a table of areas under the standardized normal curve, the table entry for each z value is the area under the curve to the left of z
"C" normality plot
right skewed distribution
backwards "C" normality plot
left skewed distribution
linear normality plot
normal distribution
"S" normality plot
uniform distribution
explanatory variable
the one we think predicts change in the response (x)
response variable
measures an outcome of a study (y)
scatterplot
shows the relationship between two quantitative variables; one variable on the horizontal axis & the other on the vertical axis
direction, form, strength
characteristics used to look at the overall pattern of a scatterplot
outlier
striking departures from the pattern; an individual value that falls outside the overall pattern
positive association
when above average values of 1 tend to accompany above average values of the other
negative association
when above average values of 1 accompany below average values of the other
correlation
measures the strength of the linear relationship between two quantitative variables (r)
regression line
a line that describes how a response variable (y) changes as an explanatory variable (x) changes; used to predict the value of y for a given x
predicted value
value of response variable for a given value of x (y hat)
slope
the amount by which y is predicted change when x increases by 1 (b)
y-intercept
the predicted value of y when x = 0 (a)
extrapolation
the use of a regression line for predictions far outside the interval of values of explanatory variable (x) used to obtain the line; such predictions are often not accurate
least-squares regression line
line of best fit; makes the sum of the squared residuals as small as possible
residual
the difference between an observed y & the predicted y (y - y hat)
residual plot
a scatterplot of the residuals against the explanatory variable; help us assess whether a linear model is appropriate
standard deviation of the residuals
gives the approximate size of a typical prediction error (s)
coefficient of determination
the fraction of the variation in values of y that is accounted for by the LSRL of y on x (r-sq)
association vs causation
association does not imply causation; a strong association between two variables is not enough to draw conclusions about cause & effect; for causation, we need a well designed experiment
transforming data
performing simple transformations of the data using logarithms that can straighten a non-linear pattern
power model
y = ax^b where the variable is the base (to achieve linearity, take log of x & y)
exponential model
y = ab^x where the variable is the exponent (to achieve linearity, take log of only y)