Descriptive Statistics

Introduction

  • Focus on Descriptive Statistics in Criminal Justice Methods.

  • Difference between Descriptive and Inferential Statistics.

Descriptive Statistics

  • Purpose: To describe phenomena through data summarization.

  • Methods of summarization include averages, dispersion, and ranges.

Dataset Overview

  • Fake dataset provided as a teaching tool.

  • Columns include: Respondent ID, Name, Gender, Gender Recode, Number of Times Shoplifted.

  • Respondent IDs are assigned anonymously for privacy.

  • Gender is nominal data and recoded into binary numbers (e.g., Male = 1, Female = 0).

Types of Data

Nominal Data

  • Cannot be ordered.

  • Example: Gender – no natural hierarchy exists.

  • Utilizes numerical values for convenience in analysis.

Variables and Records

  • Each column represents a variable; each row represents a case or record.

  • Unit of analysis refers to the type of entity being studied (e.g., individuals in this dataset).

Statistics Types

Univariate vs. Bivariate Statistics

  • Univariate Statistics: Analysis of one variable (e.g., average shoplifting).

  • Bivariate Statistics: Analysis of the relationship between two variables.

  • Typically, univariate statistics are descriptive while bivariate statistics are often inferential.

Measures of Central Tendency

  • Mean: The average of the data points.

  • Median: The middle value when ordered.

  • Mode: The most common value in the dataset.

  • Example Calculation: For shoplifting dataset:

    • Mode = 0 (most frequent),

    • Median = 1.5,

    • Mean = 2.4.

Normal Distribution

  • Describes how data is distributed in a bell curve shape.

  • In a normal distribution: mean, median, and mode are equal.

  • Skewed Distributions:

    • Positive skew: Tail extends to the right, mean > median.

    • Negative skew: Tail extends to the left, mean < median.

Dispersion of Data

Definition of Dispersion

  • Dispersion measures how spread out the data is.

Range vs. Standard Deviation

  • Range: Difference between highest and lowest values.

  • Standard Deviation: Indicates how much individual data points differ from the mean.

    • Sought after because it uses all values, not just extremes like range does.

Theoretical Normal Curve

  • Standard deviations help to explain the probability of values occurring in a normal distribution.

  • 68.26% of data lies within 1 standard deviation, 95.46% within 2, and 99.72% within 3.

Practical Application of Standard Deviation

  • Standard deviations inform on the likelihood of specific observations being drawn from a population.

  • Low probability observations (>3 standard deviations) may indicate something noteworthy.

Calculation Formula for Standard Deviation

  • Formula breaks down the steps visually: sum, mean, and variability.

Frequency Distributions

Overview

  • Shows how often different values occur within a dataset.

  • Helps in determining descriptive statistics and summarizing large data sets.

Calculating Central Tendency from Frequency Table

  • Mean: Multiply value by frequency, then sum and divide by total responses.

  • Median: Find the middle observation in the ordered frequency distribution.

  • Mode: The most frequently occurring response.

Conclusion

  • Review Descriptive Statistics importance in analyzing criminal justice datasets.

  • Invitation for questions and further elaborations on topics discussed.