Descriptive Statistics
Introduction
Focus on Descriptive Statistics in Criminal Justice Methods.
Difference between Descriptive and Inferential Statistics.
Descriptive Statistics
Purpose: To describe phenomena through data summarization.
Methods of summarization include averages, dispersion, and ranges.
Dataset Overview
Fake dataset provided as a teaching tool.
Columns include: Respondent ID, Name, Gender, Gender Recode, Number of Times Shoplifted.
Respondent IDs are assigned anonymously for privacy.
Gender is nominal data and recoded into binary numbers (e.g., Male = 1, Female = 0).
Types of Data
Nominal Data
Cannot be ordered.
Example: Gender – no natural hierarchy exists.
Utilizes numerical values for convenience in analysis.
Variables and Records
Each column represents a variable; each row represents a case or record.
Unit of analysis refers to the type of entity being studied (e.g., individuals in this dataset).
Statistics Types
Univariate vs. Bivariate Statistics
Univariate Statistics: Analysis of one variable (e.g., average shoplifting).
Bivariate Statistics: Analysis of the relationship between two variables.
Typically, univariate statistics are descriptive while bivariate statistics are often inferential.
Measures of Central Tendency
Mean: The average of the data points.
Median: The middle value when ordered.
Mode: The most common value in the dataset.
Example Calculation: For shoplifting dataset:
Mode = 0 (most frequent),
Median = 1.5,
Mean = 2.4.
Normal Distribution
Describes how data is distributed in a bell curve shape.
In a normal distribution: mean, median, and mode are equal.
Skewed Distributions:
Positive skew: Tail extends to the right, mean > median.
Negative skew: Tail extends to the left, mean < median.
Dispersion of Data
Definition of Dispersion
Dispersion measures how spread out the data is.
Range vs. Standard Deviation
Range: Difference between highest and lowest values.
Standard Deviation: Indicates how much individual data points differ from the mean.
Sought after because it uses all values, not just extremes like range does.
Theoretical Normal Curve
Standard deviations help to explain the probability of values occurring in a normal distribution.
68.26% of data lies within 1 standard deviation, 95.46% within 2, and 99.72% within 3.
Practical Application of Standard Deviation
Standard deviations inform on the likelihood of specific observations being drawn from a population.
Low probability observations (>3 standard deviations) may indicate something noteworthy.
Calculation Formula for Standard Deviation
Formula breaks down the steps visually: sum, mean, and variability.
Frequency Distributions
Overview
Shows how often different values occur within a dataset.
Helps in determining descriptive statistics and summarizing large data sets.
Calculating Central Tendency from Frequency Table
Mean: Multiply value by frequency, then sum and divide by total responses.
Median: Find the middle observation in the ordered frequency distribution.
Mode: The most frequently occurring response.
Conclusion
Review Descriptive Statistics importance in analyzing criminal justice datasets.
Invitation for questions and further elaborations on topics discussed.