1/48
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
descriptive methods
the different methods for organizing and summarizing collected data, including tabular methods, graphical methods, and numerical methods
categorical/qualitative variables
variables that place the individuals being studied into one of several groups or categories
numerical/quantitative variables
variables that have outcomes that can be analysed using arithmetic operations
univariate data
data with only one measurement on each object
bivariate data
data with two measurements on each object
frequency
the number of times that an observation occurs, usually denoted with f
relative frequency
the ratio of the frequency (f) to the total number of observations (n), usually denoted by rf, and rf = f/n
cumulative frequency
the number of observations less than or equal to a specified value, usually denoted by cf
frequency distribution table
a table giving all possible values of a variables and their frequencies
center of a distribution
the “typical” or central data point, measured in several ways, including mean, median, and mode
spread of a distribution
how far the data points are from the center, measured through the range, standard deviation, or variance
shape of a distribution
tells where most of the data is, can be symmetric or skewed
symmetric distribution
when the left half of the distribution is approximately a mirror image of the right half, meaning that the data is spread out in the same way on both sides, with the same amount of data on both sides of the center
skewed distribution
when there are extreme values in only one direction that causes one side to have a longer tail, being right-skewed if the tail is on the right, and left-skewed if the tail is on the left
outliers
an observation that is surprisingly different from the rest of the data
stem in stemplot
the left-most part of each observation
leaf in stemplot
the remaining part of each observation, excluding the left-most part
percentage frequency/relative frequency
the frequency of an observation in relativity to the whole sample
population
the entire group of individuals or things
sample
the part of the population that is studied
mean
the average value in a data set. nonresistant and affected by extreme or outlier measurements. for a population, denoted by μ, and for a sample, denoted by x̄
median
the point that divides the measurements in half. resistant and not affected by extreme or outlier measurements, better to use for skewed data or data sets with outliers. sometimes denoted as M
range
the difference between the largest and smallest measurement in a data set, not reliable as it depends on the two extreme measurements
interquartile range (IQR)
the range of the middle 50% of the data, or the difference between the third and first quartiles. resistant and not affected by extreme or outlier measurements
standard deviation
a measure of variation that takes every measurement into account. nonresistant and affected by extreme or outlier measurements
variance
the square of the standard deviation
percentiles
the division of a set of values into 100 equal parts
quartiles
the division of a set of values into four equal parts by using the 25th, 50th, and 75th percentiles
standardized scores/z-scores
(Observed measurement - mean) / standard deviation
linear regression
a model to measure the strength of the relationship between two quantitative variables with a linear relation
Pearson’s correlation coefficient
a numerical summary measure calculated to represent the linear dependence of two variables between -1 and 1. the further away from 0, the stronger the relationship
scatterplot
a graphical summary measure used to describe the nature, degree, and direction of the relation between two variables x and y, where (x, y) gives a pair of measurements
linear regression model equation
Y = α + βX where Y is the response variable, X is the explanatory variable, α is the y-intercept, and β is the slope
predicted value of y
ŷ = a + bx
least-squares regression line
a line that minimizes the sum of the squares of the residuals, otherwise known as the line of best fit. the line will always pass through the point (X̄, Ȳ) and will always have the slope β1 = (r) [Sy/Sx]
coefficient of determination
measures the percent of variation in Y-values explained by the linear relation between X- and Y-values. denoted by R2, which is equal to the square of the correlation coefficient. always between 0 and 1.
random error
a measure of how wrong the predicted values were from the measured values - denoted with ε
influential observation
an observation that strongly affects a statistic
residual plot
a plot of residuals versus the predicted values of Y
transformation
a change made to the equation for variables to make a linear form
log transformation
Z = ln(Y) used to linearize the regression model when the relationship between Y and X suggests a model with a consistently increasing slope
square root transformation
Z = √Y = Y1/2 used to linearize the regression model when the spread of observations increases with the mean
reciprocal transformation
Z = 1/Y1 used to minimize the effect of large values of X
square transformation
Z = Y2 used when the slope of the relation consistently decreases as the independent variable increases
power transformation
ln(Y) and ln(X) used if the relation between dependent and independent variables is modeled by Y = aXb
contingency table
a table of data classified by r categories of classification criteria 1 and c categories of classification criteria 2
marginal frequency
the frequency with which each category occurs
conditional relative frequency
the relative frequency of one category given the other category has occurred
association
a measurement of relation between two categorical variables