Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Barrett et al. 1998

Studied by 9 people

Sci 10 Unit A Section 1

Studied by 1 person

Behaviourism Evaluation

Studied by 1 person

6th-grade science notes

Studied by 87 people

Chapter 8 Photosynthesis (1)

Studied by 6 people

Generalized Anxiety Disorder

Studied by 8 people

L4- Distribution of Observations

Learning Objectives

By the end of this lecture you should be able to:
- Describe and identify several statistical distributions (Normal, t, \chi^2, Binomial, Bernoulli).
- Apply reference ranges to the Normal distribution when appropriate.
- Calculate the probability that an observation lies between two values using the standard Normal distribution.

Core Definitions & Concepts

Population vs Sample
- Population: Entire group of interest; possesses a fixed, but usually unknown, numerical characteristic (parameter).
- Sample: Subset drawn from the population, used to estimate population characteristics.
Parameter vs Statistic
- Parameter (Greek letters): Numerical characteristic of a population (e.g., population mean cholesterol of all Asian males in Nguyen).
- Statistic (Latin letters): Numerical summary derived from a sample (e.g., mean cholesterol of a sample of 50 Asian males in Melbourne) used to infer the parameter.

Notational Conventions

Mean
- Population mean: \mu
- Sample mean: \bar{x} (spoken “x-bar”).
Standard Deviation
- Population SD: \sigma
- Sample SD: S or SD
- Excel command: STDEV(range) computes the sample standard deviation.
Proportion
- Population proportion: \pi
- Sample proportion: p

Overview of Distributions Covered

Bernoulli
- Used for a single trial (sample size n=1) with two outcomes.
- Rarely applied in practice; not covered further in the course.
Binomial
- Two mutually exclusive outcomes; fixed number of trials n; constant probability \pi.
- Concerned with the count of “successes”.
- Example question: “In 10 events, what is P(X=5)?”
Normal (Gaussian) Distribution – Focus of this lecture.
t Distribution – Will be addressed in later sessions.
Chi-Square (\chi^2) Distribution – Will be addressed in later sessions.

The Normal (Gaussian) Distribution

Also called the Gaussion distribution.
Applies to continuous data.
Characteristics:
- Shape: Symmetric, bell-shaped curve.
- Parameters:
- Central location: \mu
- Spread: \sigma
Visual: Mean at center, curve spreads outward in units of \sigma.
Importance: Foundation for many statistical methods and probability calculations.

Summary Statistics Revisited

Provide concise, essential information about data distribution.
Two classes:
- Measures of Central Tendency: Mean, median, mode.
- Measures of Dispersion: Standard deviation, inter-quartile range (IQR), range.
Interpretation of dispersion (illustrated by two curves):
- Narrow curve ⇒ small variability.
- Wide curve ⇒ large variability.

Shapes of Distributions

Symmetric / Normal: Mean = Median = Mode.
Negatively (Left) Skewed:
- Tail extends to smaller values.
- Outliers pull mean left: \text{Mean} < \text{Median} < \text{Mode}.
Positively (Right) Skewed:
- Tail extends to larger values.
- Outliers pull mean right: \text{Mode} < \text{Median} < \text{Mean}.
Terminology:
- Symmetric ⇒ normal
- Asymmetric ⇒ skewed

Choosing Appropriate Statistics to Report

If data ~ Normal ⇒ report mean ± SD.
If data skewed ⇒ report median & IQR (mean is distorted by outliers).

Checking for Normality

Graphical Methods
- Histogram
- Box-plot
- Look for symmetry, bell shape, absence of long tails.
Numerical Comparison
- Compare mean and median:
  - Close ⇒ likely symmetric/normal.
  - Far apart ⇒ likely skewed.

Graphical & Numerical Examples

Weight Histogram
- Visually symmetric; mean = median = 79 ⇒ weight ~ Normal.
Age Distribution (Left-Skewed)
- Most values > 70; tail toward younger ages.
- Mean < Median ⇒ left skew.
Post-Procedure Length of Stay (Right-Skewed)
- Many short stays, few very long stays.
- Mean > Median ⇒ right skew.

Clinical Data Example – Cardiac Surgery Database

Variables: Pre-operative creatinine (mmol/L) & dialysis status.
Overall Creatinine
- Histogram & box-plot show long right tail (outliers up to <2 mmol/L).
Grouped by Dialysis Status
- No Dialysis:
- Mean =0.10, Median =0.09 (similar) ⇒ approximate normality despite long tail.
- Yes Dialysis:
- Mean =0.462, Median =0.426 (mean significantly higher) ⇒ strong right skew.
Statistical Reporting Decision
- Because Yes Dialysis group is skewed, compare groups using median & IQR rather than mean & SD.

Additional Example – Body Mass Index (BMI)

Histogram nearly symmetric; a few outliers up to 68 BMI.
Mean ≈ Median (red lines overlap).
Conclusion: BMI data ~ Normal; mean ± SD appropriate.

Practical Implications & Next Steps

Always examine distribution shape before selecting summary statistics or formal tests.
Reference ranges (e.g., \mu \pm 2\sigma) valid only under approximate normality.
Standard Normal table or software allows probability calculations once data are standardized.
Future lectures will build on these ideas to cover t and \chi^2 distributions, and to perform inferential tests based on Normal theory.

Note

0.0(0)

Take a practice test

Chat with Kai

Explore Top Notes

Barrett et al. 1998

Studied by 9 people

Sci 10 Unit A Section 1

Studied by 1 person

Behaviourism Evaluation

Studied by 1 person

6th-grade science notes

Studied by 87 people

Chapter 8 Photosynthesis (1)

Studied by 6 people

Generalized Anxiety Disorder

Studied by 8 people