Looks like no one added any tags here yet for you.
Categorical Variables
Variables dividing cases into distinct groups.
Quantitative Variable
Variable measuring numerical quantity for each case.
Population
All individuals or objects of interest.
Sample
Subset of the population used for inference.
Descriptive Statistics
Numerical/graphical methods to summarize data patterns.
Data Distribution
Tells values a variable takes and their frequencies.
Frequency Table
Shows number of cases in each category.
Relative Frequency
Proportion of cases in a category.
Proportion
Ratio of category instances to total observations.
Bar Chart
Graphical representation of categorical variable frequencies.
Bar Chart Titling
Guidelines for labeling and titling bar charts.
Pie Chart
Circular chart divided into sectors to show proportions
Pareto Chart
Bar chart with bars in decreasing order of frequency
Two-Way Table
Table showing relationship between two categorical variables
Mode
Most frequently occurring category in a distribution
Unimodal
Distribution with one distinct mode
Bimodal
Distribution with two modes of similar frequency
Multimodal
Distribution with more than two modes of similar frequency
Variability
Diversity of categories in a categorical distribution
Interpretation of Distribution
Analyzing the values and frequencies of a variable
Two Categorical Variable Relationship
Investigating the association between two categorical variables
Proportion Calculation
Determining the ratio or percentage of a subset in a sample
Difference in Proportions
Calculation of the variance in proportions for different categories
Segmented Bar Chart
Bar chart with segments representing different categories
Side-by-Side Bar Chart
Separate bar charts for each group of a categorical variable
Distribution of a Variable
Describes what values the variable takes and how often it takes these values.
Dot Plot
Records data values on a number line with a dot for each observed value, showing frequency and variation.
Outliers
Values notably distinct from other values in a dataset, often much larger or smaller.
Histogram
A graph for quantitative data that groups values into intervals (bins) showing frequency.
Bin Width
The difference between consecutive lower class limits in a histogram.
Number of Bars in Bar Chart
Equals the number of categories.
Number of Bars in Histogram
Varies and can be determined by the user or software.
Bar Order in Bar Chart
Order does not matter as it graphs categorical data.
Bar Order in Histogram
Must be presented in numerical order.
Width of Bars in Bar Chart
Meaningless and determined arbitrarily by software.
Width of Bars in Histogram
Defined as the bin width and can be edited.
Gaps between Bars in Bar Chart
Indicate impossibility of observations between categories.
Gaps between Bars in Histogram
Indicate no values were observed in the bin or class.
Shape of Distribution
Describes if the distribution is symmetric, mound-shaped, or has peaks or clusters.
Center of Distribution
Indicates where the distribution is centered and the typical value.
Variability of Data
Describes how spread out the data is and if most values are within a certain range.
Unusual Observations
Identifies outliers that deviate markedly from the overall pattern.
Smooth Curve in Histogram
Illustrates the general shape of the distribution with less jagged edges.
Interpreting Histogram
Involves analyzing the shape, center, variability, and unusual observations of the distribution.
Characteristics of Shape
Include symmetry, number of peaks, and presence of unusually large or small values.
Common Shapes of Distributions
Include symmetric shapes like bell-shaped and asymmetric shapes like right or left skewed.
Symmetric Shape: Bell-Shaped
Distribution where values fall in the middle, frequencies tail off symmetrically, and left and right halves mirror each other.
Symmetric Shape: Uniform
Distribution where bars tend to occur with similar frequency, creating a flat histogram.
Skewed Shape: Right-Skewed
Distribution where the tail extends to the right, with more data in the lower end.
Skewed Shape: Left-Skewed
Distribution where the tail extends to the left, with more data in the upper end.
Shape: Peaks
Classifies data by the number of peaks or mounds present (unimodal, bimodal, or multimodal).
Notes about Peaks or Mounds
Peaks of different heights in distributions may indicate distinct groups within the data.
Numerical Measures of Center and Spread
Focus on shape, center, and spread of the distribution for precise interpretation.
Populations and Samples
Populations include all individuals, while samples are portions; parameters for populations and statistics for samples.
Numerical Measures of Center
Values representing the average or typical value of a quantitative variable.
Measure of Center: Mean
Arithmetic mean, the average of data set items, calculated by summing values and dividing by the number of values.
Summation Notation
Used to sum data set observations together, identified by subscripts.
Mean Formulas
Population mean (m) and sample mean (x-bar) formulas for center measures.
Sample Mean
Uses the symbol 𝑥 (read x-bar) and is a statistic calculated using statistical software for a data set.
Median
The middle value or the value that splits the data in half; denoted as m.
Calculating Median
Ordering quantitative data from smallest to largest and determining the median's location using the formula 𝐿(m) = (𝑛+1)/2.
Measure of Center
The mean is sensitive to outliers and skewed distributions, while the median is resistant to outliers.
Comparing Mean and Median
Both provide measures of center, with the mean affected by outliers and skewness, unlike the median.
What to Do with Outliers
Consider if outliers are errors or part of the population; run analyses with and without outliers to assess their impact.
Measuring Spread
Variability in a quantitative distribution measured by how far data points spread along the x-axis.
Understanding Variation
Variation exists when data values differ from the mean, indicating the spread in the data set.
Standard Deviation
Measures how far observations are from the mean, with most data falling within one standard deviation.
Standard Deviation Calculation
Involves finding the mean, calculating deviations from the mean, squaring each deviation, summing them, and taking the square root to get the standard deviation.