1/329
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Individual in statistics
The person, animal, or object described by a set of data.
Variable
A characteristic that changes from one individual to another.
Categorical variable
A variable that places individuals into groups or categories.
Quantitative variable
A variable that measures numerical values where arithmetic makes sense.
Discrete quantitative variable
A numerical variable with separated countable values, often integers with gaps.
Continuous quantitative variable
A numerical variable that can take any value in an interval.
Frequency table
A table showing the number of times each category or value occurs.
Relative frequency table
A table showing the proportion or percentage of observations in each category.
Cumulative frequency
The running total of counts up to a certain category or value.
Cumulative relative frequency
The running total of proportions or percentages up to a certain value.
Bar graph
A graph for categorical data using separated bars to compare counts or percentages.
Pie chart
A circular graph showing parts of a whole, but it can be misleading if area or angles are distorted.
Rule for all data displays
Include a title, labels, appropriate scales, and avoid distorting size or area.
Why distorted graphs are dangerous
They can make differences look bigger or smaller than they actually are.
Histogram
A graph for quantitative data that groups values into intervals and shows frequency.
Why histograms are useful
They show the overall shape of large quantitative data sets.
Limitation of histograms
Individual data values are not visible.
Dotplot
A graph that shows each data value as a dot above a number line.
Strength of dotplots
They show individual data points clearly.
Weakness of dotplots
They become messy with large data sets.
Stemplot
A display that separates data into stems and leaves while preserving individual values.
When stemplots work best
For small to moderate data sets with values that can be neatly split.
CUSS for describing a distribution
Center, Unusual features, Shape, Spread.
Center of a distribution
The typical or middle value, often measured by mean or median.
Mean
The arithmetic average of a data set.
Median
The middle value when data are ordered.
When mean is preferred
When the distribution is roughly symmetric and has no strong outliers.
When median is preferred
When the distribution is skewed or has outliers.
Resistant statistic
A statistic not strongly affected by outliers.
Median as resistant
The median is resistant because extreme values do not greatly change its position.
IQR as resistant
IQR is resistant because it uses the middle 50% of the data.
Nonresistant statistic
A statistic strongly affected by outliers.
Mean as nonresistant
The mean changes when extreme values are added.
Standard deviation as nonresistant
Standard deviation increases when outliers create larger distances from the mean.
Shape of a distribution
The overall pattern, including symmetry, skew, clusters, and modality.
Symmetric distribution
A distribution where the left and right sides are roughly mirror images.
Skewed right
A distribution with a long tail to the right.
Skewed left
A distribution with a long tail to the left.
Direction of skew
The direction of the tail, not the side with most of the data.
Unimodal
A distribution with one clear peak.
Bimodal
A distribution with two clear peaks.
Multimodal
A distribution with more than two peaks.
Uniform distribution
A distribution where values occur with roughly equal frequency.
Unusual features
Gaps, clusters, or outliers that stand out from the general pattern.
Gap
A region in the distribution with no observations.
Cluster
A group of observations concentrated near one another.
Outlier
A data value that falls far from the rest of the data.
IQR
The interquartile range, calculated as Q3 minus Q1.
Q1
The first quartile, or the median of the lower half of the data.
Q3
The third quartile, or the median of the upper half of the data.
Five-number summary
Minimum, Q1, median, Q3, maximum.
Boxplot
A graph showing the five-number summary and possible outliers.
1.5 IQR rule
Outliers are below Q1 − 1.5(IQR) or above Q3 + 1.5(IQR).
Standard deviation
The typical distance of data values from the mean.
Sample standard deviation
Usually written as s, used when describing a sample.
Population standard deviation
Usually written as σ, used when describing a population.
Parameter
A number that describes a population.
Statistic
A number that describes a sample.
Why parameters are often unknown
We usually cannot measure every individual in the population.
Why statistics vary
Different random samples usually produce different values.
Z-score
A standardized value showing how many standard deviations a value is from the mean.
Z-score formula
z = (x − mean) / standard deviation.
Meaning of positive z-score
The value is above the mean.
Meaning of negative z-score
The value is below the mean.
Meaning of z = 0
The value equals the mean.
Why z-scores have no units
They measure distance in standard deviations, not original units.
Why z-scores are useful
They allow comparison of values from different distributions.
Effect of adding a constant to data
Measures of position shift by that constant, but spread does not change.
Effect of subtracting a constant from data
Measures of position decrease by that constant, but spread stays the same.
Effect of multiplying data by a positive constant
Measures of position and spread are multiplied by that constant.
Effect of dividing data by a positive constant
Measures of position and spread are divided by that constant.
Effect of changing units from inches to feet
Both position and spread are rescaled.
Association between categorical variables
A relationship where the distribution of one variable changes depending on the other variable.
No association in categorical data
Conditional distributions are approximately the same across groups.
Two-way table
A table showing counts for two categorical variables.
Marginal distribution
The distribution of one categorical variable using the row or column totals.
Conditional distribution
The distribution of one variable among only individuals in a specific category of another variable.
Joint relative frequency
A proportion involving one cell of a two-way table compared to the total.
Marginal relative frequency
A row or column total divided by the grand total.
Conditional relative frequency
A cell count divided by its row or column total, depending on the condition.
Segmented bar graph
A graph showing conditional distributions within categories.
Side-by-side bar graph
A graph comparing categorical distributions across groups.
Best graph for comparing conditional distributions
A segmented bar graph or side-by-side bar graph.
Explanatory variable
The variable that may explain or predict changes in another variable.
Response variable
The variable being measured or predicted.
Scatterplot
A graph showing the relationship between two quantitative variables.
What to describe in a scatterplot
Direction, form, strength, and unusual points.
Positive association
As x increases, y tends to increase.
Negative association
As x increases, y tends to decrease.
Linear form
The pattern in a scatterplot is roughly a straight line.
Nonlinear form
The pattern in a scatterplot curves or does not follow a straight line.
Strong association
Points fall close to a clear pattern.
Weak association
Points are widely scattered around the pattern.
Correlation r
A number measuring the direction and strength of a linear relationship.
Range of r
−1 ≤ r ≤ 1.
r close to 1
Strong positive linear association.
r close to −1
Strong negative linear association.
r close to 0
Weak or no linear association.
What r does not measure
Form, unusual features, or causation.
Correlation has no units
r is standardized, so it does not use the original units of x or y.