COMM 1503 IQR Outliers and ZScores

Definition: Central tendency refers to the typical or average value in a numerical dataset.

Mean
- Function in Excel: =AVERAGE(number1, number2, …)
- Example Calculation:
  - Mean value of dataset: 26.91 minutes.
- Implication: A useful estimation for the average commute, indicating that for another city, one might expect an average around this value.
Median
- Excel Function: =MEDIAN(number1, number2, …)
- Definition: Median is the middle value when the dataset is organized in ascending order.
- Example Calculation: Median found as 25.95 minutes.
- Advantage over Mean: Less affected by extreme values and skewed distributions.
Skewness
- Description: Evaluates how much the mean and median differ, indicating dataset symmetry.
- Indicators:
  - If mean > median: Skewed Right.
  - If mean < median: Skewed Left.
  - If both are similar: Dataset is relatively symmetric.

Standard Deviation (SD)
- Excel Functions:
  - Sample SD: =STDEV.S(number1, number2, …)
  - Population SD: =STDEV.P(number1, number2, …)
- Interpretation: Average distance of values from mean. A smaller SD indicates that data points are closer to the mean, while a larger SD indicates more variability.
- Example Calculation:
  - Standard Deviation: Approximately 5 minutes.
  - Expected range: from 22 to 32 minutes around the mean (26.91 ± 5).
Variance
- Definition: Variance is the standard deviation squared (SD²).
- Calculation in Excel:
  - Sample Variance: =VAR.S(number1, number2, …)
  - Population Variance: =VAR.P(number1, number2, …)
- Interpretation: Describes how spread out the dataset is; non-negative value.
- Connection: Variance and standard deviation are linked through:
  - Variance = (SD)²
  - Units: Variance units are squared, making it less interpretable than SD.
Interquartile Range (IQR)
- Definition: The range between the first and third quartiles (Q1 and Q3) that represents the middle 50% of the dataset.
- Formula: IQR = Q3 - Q1
- Usage: Specifically useful in identifying variability when using median as a central value.
- Interpretation: Larger IQR indicates more variability in the middle 50% of the data, while smaller indicates less.

Z-Scores
- Definition: The z-score indicates how many standard deviations an observation is from the mean.
- Formula:
  - For a sample: $Z<em>i = \frac{X</em>i - \bar{X}}{S}$
- Implications: A z-score > 3 or < -3 suggests a potential outlier.
- Example: A z-score of 1.6 indicates the observation is 1.6 standard deviations above the mean.
- Utility: Useful for comparing different datasets by standardizing measurements.
IQR Method
- Procedure:
  - Calculate Q1 and Q3.
  - Define outliers using:
  - Lower Bound: Q1 - 1.5 * IQR
  - Upper Bound: Q3 + 1.5 * IQR
- Example Application: Determine if a value lies outside these bounds to consider it an outlier.

Descriptive Statistics Tool:
- Access via the Data Analysis feature to quickly compute summary statistics such as mean, median, mode, variance, standard deviation, etc.
- Steps: Select Data Tab -> Data Analysis -> Descriptive Statistics, choose range, and output preferences.

Assumption: Applicable for symmetric, bimodal distributions (bell-shaped).
- Within 1 SD: 68% of data
- Within 2 SD: 95% of data
- Within 3 SD: 99.7% of data
Connection to Outliers: Helps identify potential outliers based on how far observations deviate from the mean.

Importance of Showing Work: Essential for transparent evaluation and reassures the methodology used in calculations.
Emphasis on practice with manual calculation (hand-on exercises) for understanding before relying on tools like Excel.
Final Notes: Always relate findings to real-world implications, ask clarifying questions for methodologies, and exercise ethical data treatment (account for potential errors without presuming mistakes).

Consistent and thorough practice with Excel and statistical calculations will be pivotal in understanding and applying statistical analysis to datasets effectively.
Aim to integrate knowledge of both descriptive statistics and contextual implications in results interpretation.