Notes on Error, Uncertainty, and Statistical Analysis

Types of Error

Systematic Error:
- A flaw in equipment or design.
- The error is reproducible, meaning it consistently affects measurements in the same way.
- Example (Precision vs. Accuracy): Measurements like 12.7, 12.6, 12.8 for an expected value of 12 . These measurements are precise (close to each other) but not accurate (shifted from the true value), indicating a systematic error.
Random Error:
- Caused by uncontrolled (and sometimes controllable) variables.
- Has an equal chance of being positive or negative.
- Example (Precision vs. Accuracy): Measurements like 8, 10, 14, 16 for an expected value of 12 . These measurements are accurate (their average is close to the true value) but not precise (spread out), indicating random error.

Uncertainty

Absolute Uncertainty:
- The margin of uncertainty associated with a direct measurement.
- Indicates the range within which the true value is expected to lie.
- Notation: Can be expressed as 70 ext{°F}
  ightarrow 70 \pm 5 ext{°F} or 70(5) ext{°F} .
- This means the certain value is between 65 ext{°F} and 75 ext{°F} .
Relative (Relative Percent) Uncertainty:
- The size of the uncertainty with respect to the measurement itself.
- Calculated as the ratio of absolute uncertainty to the measurement.
- Formula: Relative uncertainty = \frac{\text{absolute uncertainty}}{\text{measurement}}
- Example: For 70 \pm 5 ext{°F} , relative uncertainty = \frac{5}{70} \approx 0.07 .
- Percent Uncertainty: Relative uncertainty multiplied by 100 .
- Formula: Percent uncertainty = \text{relative uncertainty} \times 100 (or \frac{\text{absolute uncertainty}}{\text{measurement}} \times 100 )
- Example: 0.07 \times 100 = 7\% .

Estimating Uncertainty

For Estimated Digits (e.g., from a graduated scale):
- Convention 1: The estimated digit corresponds to \frac{1}{10} of the smallest mark on the measuring device.
  - Example: A scale with 1 ext{mL} marks, you might read 14.5 ext{mL} . The uncertainty would be \pm 0.1 ext{mL} (if the smallest mark is 1 ext{mL} and you estimate to the tenth, then a general estimate is 1/10 of the smallest division). This implies a range of 14.4 ext{mL} - 14.6 ext{mL} .
- Convention 2: A common convention is to estimate uncertainty to \frac{1}{2} of the smallest mark.
  - Example: For a device with 1 ext{mL} marks, if you read 14 ext{mL} , the uncertainty could be \pm 0.5 ext{mL} . This means the true value is between 13.5 ext{mL} and 14.5 ext{mL} .
- The choice of estimation method reflects your confidence in the measurement.
For Numerical Readouts (e.g., digital balances):
- Uncertainty is assumed to be \pm 1 in the last displayed digit.
- Example: A mass reading of 1.7346 ext{ g} has an uncertainty of \pm 0.0001 ext{ g} , meaning the actual mass is between 1.7345 ext{ g} and 1.7347 ext{ g} .
Propagation of Error: When performing calculations involving multiple measurements, the uncertainty associated with each measurement must be propagated through the calculation to determine the final uncertainty.

Statistics

Average (Mean - \bar{x} ):
- The sum of all values divided by the number of measurements.
- Formula: \bar{x} = \frac{\sum x_i}{n}
  - Where \sum x_i is the sum of individual measurements and n is the total number of measurements.
Standard Deviation (STDEV - s or \sigma ):
- A measure of how closely the data cluster around the mean.
- Population Standard Deviation ( \sigma or STDEV. P): Calculated when data from the entire population is available (rare in experimental settings).
  - Formula: \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}
- Sample Standard Deviation ( s or STDEV. S): Calculated from a sample of a larger population (most common in practical applications).
  - Formula: s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}
  - The denominator (n-1) is used for samples because it provides a less biased estimate of the population standard deviation, tending to be larger than the population standard deviation formula to account for unknown values.
  - Example: A series of mass measurements (e.g., 2.7196, 2.7201, 2.7210, \dots ) can be summarized by their average and standard deviation (e.g., 2.7202 \pm 0.0007 ext{ g} ).
Confidence Intervals:
- A range of values calculated from a sample that is likely to contain the true value of a population parameter.
- Expressed as a probability that the population parameter will fall between a set of values.
- Example: For a measurement 2.7202 \pm 0.0007 ext{ g} , the range is 2.7195 - 2.7209 ext{ g} .
- Confidence Levels related to Standard Deviations (for normally distributed data):
  - One Standard Deviation ( \pm 1s ): Approximately 68\% of the population will fall within this range.
  - Two Standard Deviations ( \pm 2s ): Approximately 95\% of the population will fall within this range.
  - Three Standard Deviations ( \pm 3s ): Approximately 99.7\% of the population will fall within this range.

Comparison of Means (t-test)

Purpose: To determine if the averages from two data sets are statistically different from each other.
Prerequisites: This test assumes that the data is normally distributed and that the variances (standard deviations squared) of the two populations are either equal or known to be different.
Case 1: Standard Deviations (STDEV) are Assumed to be Equal
- Formula for calculated t-value ( t{calculated} ): t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{1}{n1} + \frac{1}{n_2}}} Where:
  - \bar{x}1 and \bar{x}2 are the averages of the two samples.
  - n1 and n2 are the number of measurements in each sample.
  - \text{STDEV} is the pooled standard deviation when standard deviations are considered equal. (Note: sometimes a simpler STDEV average is used directly if the STDEVs are very close.)
- Example: Comparing the mass of two groups of frogs: ME ( 4.036 \pm 0.003 ext{ g} for 10 frogs) and MA ( 4.002 \pm 0.003 ext{ g} for 8 frogs).
  - Here, STDEV1 = STDEV2 = 0.003 ext{ g} . (The example provided in the transcript shows a calculation for this case, resulting in t_{calculated} \approx 23.82 ).
Case 2: Standard Deviations (STDEV) are Different
- When the standard deviations of the two data sets are significantly different, a pooled standard deviation is calculated.
- Formula for Pooled Standard Deviation ( STDEV{pooled} ):
  STDEV{pooled} = \sqrt{\frac{(\text{STDEV}1^2)(n1-1) + (\text{STDEV}2^2)(n2-1)}{n1 + n2 - 2}}
- Formula for calculated t-value ( t{calculated} ) when STDEV is different:
  t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{STDEV{pooled} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\sqrt{\frac{\text{STDEV}1^2}{n1} + \frac{\text{STDEV}2^2}{n2}}}
Decision Rule for t-test:
- Compare the calculated t-value ( t{calculated} ) with a critical t-value from a Student's t-table ( t{table} ).
- The t_{table} value is found using:
  - Degrees of Freedom (DOF): df = n1 + n2 - 2 .
  - Confidence Level: Typically 95\% (or 0.05 significance level for a two-tailed test).
- Conclusion: If t{calculated} > t{table} , then the two results (averages) are statistically different at the specified confidence level.

Student's t-table (Excerpt from Harris. Quantitative Chemical Analysis. Eighth Edition)

This table provides critical t-values for various degrees of freedom and confidence levels.

Degrees of Freedom	Confidence Level (\%)
	50	90	95	99.9
1	1.000	6.314	12.706	636.578
2	0.816	2.920	4.303	31.598
3	0.765	2.353	3.182	12.924
4	0.741	2.132	2.776	8.610
…	…	…	…	…
15	0.691	1.753	2.131	4.073
…	…	…	…	…
\infty	0.674	1.645	1.960	3.291

To use the table, locate the row corresponding to your degrees of freedom ( n1 + n2 - 2 ) and the column for your desired confidence level (commonly 95\% ). The intersecting value is t_{table} .
For example, with 16 degrees of freedom ( n1=10, n2=8
ightarrow 10+8-2=16 ), at a 95\% confidence level, the t{table} would be between 2.131 (for df=15 ) and 2.086 (for df=20 ). A precise value for df=16 would be found in a full table. The purpose is to determine if the t{calculated} exceeds this critical value.