Descriptive Statistics

Introduction

Focus on Descriptive Statistics in Criminal Justice Methods.
Difference between Descriptive and Inferential Statistics.

Descriptive Statistics

Purpose: To describe phenomena through data summarization.
Methods of summarization include averages, dispersion, and ranges.

Dataset Overview

Fake dataset provided as a teaching tool.
Columns include: Respondent ID, Name, Gender, Gender Recode, Number of Times Shoplifted.
Respondent IDs are assigned anonymously for privacy.
Gender is nominal data and recoded into binary numbers (e.g., Male = 1, Female = 0).

Types of Data

Nominal Data

Cannot be ordered.
Example: Gender – no natural hierarchy exists.
Utilizes numerical values for convenience in analysis.

Variables and Records

Each column represents a variable; each row represents a case or record.
Unit of analysis refers to the type of entity being studied (e.g., individuals in this dataset).

Statistics Types

Univariate vs. Bivariate Statistics

Univariate Statistics: Analysis of one variable (e.g., average shoplifting).
Bivariate Statistics: Analysis of the relationship between two variables.
Typically, univariate statistics are descriptive while bivariate statistics are often inferential.

Measures of Central Tendency

Mean: The average of the data points.
Median: The middle value when ordered.
Mode: The most common value in the dataset.
Example Calculation: For shoplifting dataset:
- Mode = 0 (most frequent),
- Median = 1.5,
- Mean = 2.4.

Normal Distribution

Describes how data is distributed in a bell curve shape.
In a normal distribution: mean, median, and mode are equal.
Skewed Distributions:
- Positive skew: Tail extends to the right, mean > median.
- Negative skew: Tail extends to the left, mean < median.

Dispersion of Data

Definition of Dispersion

Dispersion measures how spread out the data is.

Range vs. Standard Deviation

Range: Difference between highest and lowest values.
Standard Deviation: Indicates how much individual data points differ from the mean.
- Sought after because it uses all values, not just extremes like range does.

Theoretical Normal Curve

Standard deviations help to explain the probability of values occurring in a normal distribution.
68.26% of data lies within 1 standard deviation, 95.46% within 2, and 99.72% within 3.

Practical Application of Standard Deviation

Standard deviations inform on the likelihood of specific observations being drawn from a population.
Low probability observations (>3 standard deviations) may indicate something noteworthy.

Calculation Formula for Standard Deviation

Formula breaks down the steps visually: sum, mean, and variability.

Frequency Distributions

Overview

Shows how often different values occur within a dataset.
Helps in determining descriptive statistics and summarizing large data sets.

Calculating Central Tendency from Frequency Table

Mean: Multiply value by frequency, then sum and divide by total responses.
Median: Find the middle observation in the ordered frequency distribution.
Mode: The most frequently occurring response.

Conclusion

Review Descriptive Statistics importance in analyzing criminal justice datasets.
Invitation for questions and further elaborations on topics discussed.