Standard deviation and variance

Topics Covered:
- What to do with Quantitative Data
- Variance and Standard Deviation
- Z-Scores

Collect and Pre-Process Data: Gather relevant numerical data and prepare it for analysis to ensure accuracy and relevance.
Descriptive Statistics: Summarize and describe features of the data, providing insights into its characteristics.
Test for Differences: Analyze data to determine if there are significant differences between various groups or conditions within the dataset.
Look for Relationships: Explore correlations and dependencies between different variables to understand their interactions.

X Axis (Horizontal): Represents the range of values or bins.
Y Axis (Vertical): Represents the frequency or count of observations in each bin.
Bins: Ranges of values that group the data for analysis.
Purpose: Shows how much of the data is concentrated across the range of values, helping visualize distribution.

20 Class Count Distribution:
- Example bin heights representing the number of states with specific percentages of foreign-born residents:
  - 0.1 to 5.0%: Height 13 states
  - 5.1 to 10.0%: Height 20 states
  - 10.1 to 15.0%: Height 10 states
  - ...

Two Peaks in Distribution:
- Indicates two distinct groups; in education data, such peaks might show the presence of 'ACT states' vs 'SAT states'.

Outliers: Observations that fall significantly outside the overall distribution pattern; may require further investigation.

Symmetric Distribution: Both sides of the histogram are mirror images.
Right-Skewed Distribution: Tail on the right extends further; common in income data, with mean and median pulled upwards.
Left-Skewed Distribution: Tail on the left extends further; mean and median pulled downwards.

Histogram Examples:
- Age at Death of Australian Males: Left-skewed distribution features a left tail.
- Income Data: Right-skewed distribution shows typical income reporting methods focused on median values.

Characteristics:
- Symmetrical and bell-shaped (Gaussian curve).
- Mean = Median = Mode.
- Many statistical methods assume data are normally distributed.

Definition: Measures how broadly the data is distributed around the mean. It is calculated by measuring and squaring the distance of each observation from the mean, summing these squared distances, and dividing by the number of observations.

Definition: The square root of variance, indicating how much individual data points deviate from the mean.
Empirical Rule for Normal Distribution:
- 68% of observations are within +/- 1 Standard Deviation (SD).
- 95% are within +/- 2 SD.
- 99.7% are within +/- 3 SD.

Definition: Each data point has an associated z-score which indicates how many standard deviations away it is from the mean.
Characteristics:
- Mean of z-scores is zero; standard deviation is one.
- Important to calculate z-scores only if data follows a normal distribution.
- A z-score can clarify whether an observation is typical or atypical within the dataset.
- Negative z-scores indicate values below the mean; positive z-scores indicate values above the mean.