Send a link to your students to track their progress
54 Terms
1
New cards
Categorical Variable
Variable that represents categories that place data into groups
2
New cards
Quantitative Variable
Variable for which the numbers act as numerical values with known units
3
New cards
Distribution
The possible values of a variable and the frequency that each value occurs
4
New cards
Frequency Table
lists the categories for a categorical variable and displays the counts for each category
5
New cards
Relative Frequency Table
lists the categories for a categorical variable and displays the proportion/percentage for each category
6
New cards
Bar Chart
Display where bars represent the count of each category for a categorical variable
7
New cards
Relative Frequency Bar Chart
Display where bars represent the proportion/percentage of each category for a categorical variable
8
New cards
Segmented Bar Chart (Stacked Bar Chart)
Display where one bar represents a "whole" that is proportionally divided by each category for a categorical variable
9
New cards
Pie Chart
Display where one circle represents a "whole" that is proportionally divided by each category for a categorical variable
10
New cards
Comparative Display
Display (of any type) that is used to directly compare two or more distributions at once
11
New cards
Stem & Leaf Plot
Display that shows both the distribution and the individual data values for a quantitative variable as shared with "stems" with individual "leaves"
12
New cards
Dot Plot
Display where a dot is graphed on a single axis for each data value, stacking repeated values, this showing the distribution of a quantitative variable
13
New cards
Histogram
Display where bars represent the count of values falling into intervals ("bins") for a quantitative variable, showing its distribution
14
New cards
Relative Frequency Histogram
Display where vars represent the proportion/percentage of values falling into intervals ("bins") for a quantitative variable, showing its distribution
15
New cards
Cumulative Relative Frequency Plot
Display where a line shows the percentage of observations that are less than or equal to particular values for a quantitative variable
16
New cards
Percentile
The nth percentile is the value that falls above n% of the data (for example, the 90th percentile is above 90% of the data, demarking the top 10% of the data)
17
New cards
Context
Identifies what is being described/compared/analyzed
18
New cards
Shape
Describes the "look" of the distribution
19
New cards
Mode(s)
The most commonly occurring value(s) in a distribution, seen as hump(s) in displays. Can be unimodal (one mode), bimodal (two modes), or multimodal (three or more)
20
New cards
Uniform
A distribution that is roughly flat in shape, meaning there is no consequential mode
21
New cards
Symmetric
A distribution whose left & right halves form the center are approximately the same
22
New cards
Skewed
A Distribution that is not symmetric and has a longer tail on one side. Skewness is where the tail is (tail on left = skewed left, tail on right = skewed right)
23
New cards
Center
A value that attempts to summarize the entire distribution with s single number
24
New cards
Mean
The arithmetic average of a distribution. Sum of all the data values divided by the number of data values.
25
New cards
Median
The middle value of a distribution, where half the data is above and half of the data is below this value (50th percentile)
26
New cards
Spread
Describes how tightly the data is clustered around the center
27
New cards
Standard Deviation
The average distance a data value is from the mean
28
New cards
Quartile
One of three values (Q1, median, Q3) that divide a data set into four equal parts
29
New cards
1st Quartile (Lower)
The median of the lower half of the distribution (25th percentile), known as Q1
30
New cards
3rd Quartile (Upper)
The median of the upper half of the distribution (75th percentile), known as Q3
31
New cards
Interquartile Range (IQR)
The difference between the first and third quartiles, which is the middle 50% of the data
32
New cards
Range
The difference between the maximum and minimum values in a data set
33
New cards
Outlier
A data value that falls outside of the overall pattern of the rest of the data, specifically beyond 1.5IQR from either Q1 or Q3 (these form your fences)
34
New cards
Resistant
A calculated summary statistic is resistant if outliers have little to no effect on it...for example, medians/IQR's are resistant while means/standard deviations/ranges are not
35
New cards
5-Number Summary
Reports the minimum, Q1, median, Q3, and maximum of a distribution
36
New cards
Boxplot
Display that shows the 5-Number Summary as a central box, whiskers, and outliers, effectively dividing the data into quartiles
37
New cards
Shifting
Adding or subtracting a constant to every data value, which adds or subtracts that same constant to all measures of position and leaves measures of spread unchanged
38
New cards
Rescaling
Multiplying or dividing every data value by a constant, which multiplies or divides all measures of position and spread by that same constant
39
New cards
Standardized Values
Values for which the units have been systematically eliminated, allowing for comparison, even if the original variables had different scales and/or units
40
New cards
z-Score
Standardized value that identifies how many standard deviations a value is from the mean; z-scores don't change a distribution's shape, but force the mean to 0 and standard deviation to 1
41
New cards
Scatterplot
Display that shows the relationship between two quantitative variables measured for the same subjects on an x-y coordinate plane
42
New cards
Association
Relationship between two quantitative variables, described by SDFOC
43
New cards
Strength
Describes how well/closely the data follows the identified pattern of a scatterplot
44
New cards
Direction
A positive direction means that one variable increases as the other increases...a negative direction means that one variable decreases as the other increases
45
New cards
Form
Describes the overall shape of a scatterplot; we focus on linear vs. non-linear
46
New cards
Outliers
Points that fall outside of the overall pattern of a scatterplot
47
New cards
Context
Identifies the two variables for which an association is being described
48
New cards
Explanatory Variable
The variable that is thought to explain or predict the response variable (x-axis)
49
New cards
Response Variable
The variable that is thought to be explained/predicted by the explanatory variable (y-axis)
50
New cards
Correlation Coefficient (r)
The number that describes both the direction and strength of the linear association between two quantitative variables, from -1 to 1, where -1 is perfectly negative linear and 1 perfectly positive linear
51
New cards
Linear Model via Least Squares Regression Line of Best Fit
A linear equation that is used to simplify and represent an association, found via the line that minimizes the sum of the squared residuals
52
New cards
Predicted Value (ŷ)
The predicted y-value found for each x-value by substituting that x into the linear model producing the points (x, ŷ)...a "hat" in Statistics means that a value is predicted
53
New cards
Residual
The difference between an observed data value and the predicted value from the model...Residual = observed - predicted = y - ŷ
54
New cards
Coefficient of Determination (R^2)
The square of the correlation coefficient, which gives the percentage of the variability of y that is accounted for by the least squares regression on x, from 0% to 100%. Provides an overall measure of how strong the regression is in linearly relating y to x.