stats 2
Understanding Statistics and Data
- Definition: Statistics is the study of data, including how to collect, analyze, and interpret information to make informed decisions.
- Datasets: Composed of cases (subjects or units) which can be people, animals, or objects.
- Variables: Characteristics of cases that can take on various values. Examples include:
- Age
- Gender
- IQ Scores
- Test Scores
- Labels: Special types of variables used to uniquely identify cases (e.g., participant numbers), which hold no substantive meaning.
Measurement Levels
Categorical/Qualitative Variables:
- Nominal: No order or measurable unit (e.g., ethnicity, gender).
- Ordinal: Have an order but no measurable unit (e.g., family position: youngest, middle, oldest).
Quantitative Variables:
- Interval: Ordered values with no fixed zero point (e.g., IQ).
- Ratio: Ordered values with a meaningful zero point (e.g., age).
Precision of Measurement Levels: Nominal is the least precise; ratio is the most precise.
Data Analysis Process
- Identifying Variables:
- Who?: Define the subjects or cases.
- What?: Determine measurable characteristics (variables).
- Why?: Understand the reasoning behind data collection.
- Data Distribution: After identifying variables, examine their distribution.
Displaying Data Distributions
Categorical Variable Distribution
- Methods:
- Pie Chart: Shows categories and their relative frequencies (should equal 100%).
- Bar Graph: Displays frequency on the y-axis vs. variable values on the x-axis.
Quantitative Variable Distribution
- Methods:
- Histogram: Illustrates data distribution quickly using bars for intervals.
- Stem Plot: Displays original data values divided into stems and leaves for clarity.
Describing Distributions
Measures of Center:
- Mean: Sum of values divided by count (only for interval and ratio variables).
- Median: Middle value in ordered data; if no middle exists, average the two middle values.
- Mode: Most frequently occurring value.
- Quartiles:
- Q1: First quartile (25% of data).
- Q2: Median (50% of data).
- Q3: Third quartile (75% of data).
Measures of Spread:
- Variance: Average of the squared differences from the mean.
- Standard Deviation: Square root of variance, indicating average deviation from the mean.
- Range: Difference between maximum and minimum scores.
Box Plots
- Components:
- Minimum
- First Quartile (Q1)
- Median (Q2)
- Third Quartile (Q3)
- Maximum
Probability Density Functions and Normal Distributions
Properties of Density Curves:
- Describes the pattern of a quantitative variable.
- Area under the curve equals 1 (representing total probability).
Normal Distribution Characteristics:
- Symmetrical shape
- Single peak (mean)
- Bell-shaped curve
Notation: Denoted as N(µ, σ) where µ is mean and σ is standard deviation.
Relevance: Often applicable to real-world data, providing insights on probabilities and statistical conclusions.