Notes on Error, Uncertainty, and Statistical Analysis
Types of Error
Systematic Error:
A flaw in equipment or design.
The error is reproducible, meaning it consistently affects measurements in the same way.
Example (Precision vs. Accuracy): Measurements like 12.7, 12.6, 12.8 for an expected value of 12 . These measurements are precise (close to each other) but not accurate (shifted from the true value), indicating a systematic error.
Random Error:
Caused by uncontrolled (and sometimes controllable) variables.
Has an equal chance of being positive or negative.
Example (Precision vs. Accuracy): Measurements like 8, 10, 14, 16 for an expected value of 12 . These measurements are accurate (their average is close to the true value) but not precise (spread out), indicating random error.
Uncertainty
Absolute Uncertainty:
The margin of uncertainty associated with a direct measurement.
Indicates the range within which the true value is expected to lie.
Notation: Can be expressed as 70 ext{°F}
ightarrow 70 \pm 5 ext{°F} or 70(5) ext{°F} .This means the certain value is between 65 ext{°F} and 75 ext{°F} .
Relative (Relative Percent) Uncertainty:
The size of the uncertainty with respect to the measurement itself.
Calculated as the ratio of absolute uncertainty to the measurement.
Formula: Relative uncertainty = \frac{\text{absolute uncertainty}}{\text{measurement}}
Example: For 70 \pm 5 ext{°F} , relative uncertainty = \frac{5}{70} \approx 0.07 .
Percent Uncertainty: Relative uncertainty multiplied by 100 .
Formula: Percent uncertainty = \text{relative uncertainty} \times 100 (or \frac{\text{absolute uncertainty}}{\text{measurement}} \times 100 )
Example: 0.07 \times 100 = 7\% .
Estimating Uncertainty
For Estimated Digits (e.g., from a graduated scale):
Convention 1: The estimated digit corresponds to \frac{1}{10} of the smallest mark on the measuring device.
Example: A scale with 1 ext{mL} marks, you might read 14.5 ext{mL} . The uncertainty would be \pm 0.1 ext{mL} (if the smallest mark is 1 ext{mL} and you estimate to the tenth, then a general estimate is 1/10 of the smallest division). This implies a range of 14.4 ext{mL} - 14.6 ext{mL} .
Convention 2: A common convention is to estimate uncertainty to \frac{1}{2} of the smallest mark.
Example: For a device with 1 ext{mL} marks, if you read 14 ext{mL} , the uncertainty could be \pm 0.5 ext{mL} . This means the true value is between 13.5 ext{mL} and 14.5 ext{mL} .
The choice of estimation method reflects your confidence in the measurement.
For Numerical Readouts (e.g., digital balances):
Uncertainty is assumed to be \pm 1 in the last displayed digit.
Example: A mass reading of 1.7346 ext{ g} has an uncertainty of \pm 0.0001 ext{ g} , meaning the actual mass is between 1.7345 ext{ g} and 1.7347 ext{ g} .
Propagation of Error: When performing calculations involving multiple measurements, the uncertainty associated with each measurement must be propagated through the calculation to determine the final uncertainty.
Statistics
Average (Mean - \bar{x} ):
The sum of all values divided by the number of measurements.
Formula: \bar{x} = \frac{\sum x_i}{n}
Where \sum x_i is the sum of individual measurements and n is the total number of measurements.
Standard Deviation (STDEV - s or \sigma ):
A measure of how closely the data cluster around the mean.
Population Standard Deviation ( \sigma or STDEV. P): Calculated when data from the entire population is available (rare in experimental settings).
Formula: \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}
Sample Standard Deviation ( s or STDEV. S): Calculated from a sample of a larger population (most common in practical applications).
Formula: s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}
The denominator (n-1) is used for samples because it provides a less biased estimate of the population standard deviation, tending to be larger than the population standard deviation formula to account for unknown values.
Example: A series of mass measurements (e.g., 2.7196, 2.7201, 2.7210, \dots ) can be summarized by their average and standard deviation (e.g., 2.7202 \pm 0.0007 ext{ g} ).
Confidence Intervals:
A range of values calculated from a sample that is likely to contain the true value of a population parameter.
Expressed as a probability that the population parameter will fall between a set of values.
Example: For a measurement 2.7202 \pm 0.0007 ext{ g} , the range is 2.7195 - 2.7209 ext{ g} .
Confidence Levels related to Standard Deviations (for normally distributed data):
One Standard Deviation ( \pm 1s ): Approximately 68\% of the population will fall within this range.
Two Standard Deviations ( \pm 2s ): Approximately 95\% of the population will fall within this range.
Three Standard Deviations ( \pm 3s ): Approximately 99.7\% of the population will fall within this range.
Comparison of Means (t-test)
Purpose: To determine if the averages from two data sets are statistically different from each other.
Prerequisites: This test assumes that the data is normally distributed and that the variances (standard deviations squared) of the two populations are either equal or known to be different.
Case 1: Standard Deviations (STDEV) are Assumed to be Equal
Formula for calculated t-value ( t{calculated} ): t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{1}{n1} + \frac{1}{n_2}}} Where:
\bar{x}1 and \bar{x}2 are the averages of the two samples.
n1 and n2 are the number of measurements in each sample.
\text{STDEV} is the pooled standard deviation when standard deviations are considered equal. (Note: sometimes a simpler STDEV average is used directly if the STDEVs are very close.)
Example: Comparing the mass of two groups of frogs: ME ( 4.036 \pm 0.003 ext{ g} for 10 frogs) and MA ( 4.002 \pm 0.003 ext{ g} for 8 frogs).
Here, STDEV1 = STDEV2 = 0.003 ext{ g} . (The example provided in the transcript shows a calculation for this case, resulting in t_{calculated} \approx 23.82 ).
Case 2: Standard Deviations (STDEV) are Different
When the standard deviations of the two data sets are significantly different, a pooled standard deviation is calculated.
Formula for Pooled Standard Deviation ( STDEV{pooled} ):
STDEV{pooled} = \sqrt{\frac{(\text{STDEV}1^2)(n1-1) + (\text{STDEV}2^2)(n2-1)}{n1 + n2 - 2}}Formula for calculated t-value ( t{calculated} ) when STDEV is different:
t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{STDEV{pooled} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\sqrt{\frac{\text{STDEV}1^2}{n1} + \frac{\text{STDEV}2^2}{n2}}}
Decision Rule for t-test:
Compare the calculated t-value ( t{calculated} ) with a critical t-value from a Student's t-table ( t{table} ).
The t_{table} value is found using:
Degrees of Freedom (DOF): df = n1 + n2 - 2 .
Confidence Level: Typically 95\% (or 0.05 significance level for a two-tailed test).
Conclusion: If t{calculated} > t{table} , then the two results (averages) are statistically different at the specified confidence level.
Student's t-table (Excerpt from Harris. Quantitative Chemical Analysis. Eighth Edition)
This table provides critical t-values for various degrees of freedom and confidence levels.
Degrees of Freedom | Confidence Level (\%) | |||
---|---|---|---|---|
50 | 90 | 95 | 99.9 | |
1 | 1.000 | 6.314 | 12.706 | 636.578 |
2 | 0.816 | 2.920 | 4.303 | 31.598 |
3 | 0.765 | 2.353 | 3.182 | 12.924 |
4 | 0.741 | 2.132 | 2.776 | 8.610 |
… | … | … | … | … |
15 | 0.691 | 1.753 | 2.131 | 4.073 |
… | … | … | … | … |
\infty | 0.674 | 1.645 | 1.960 | 3.291 |
To use the table, locate the row corresponding to your degrees of freedom ( n1 + n2 - 2 ) and the column for your desired confidence level (commonly 95\% ). The intersecting value is t_{table} .
For example, with 16 degrees of freedom ( n1=10, n2=8
ightarrow 10+8-2=16 ), at a 95\% confidence level, the t{table} would be between 2.131 (for df=15 ) and 2.086 (for df=20 ). A precise value for df=16 would be found in a full table. The purpose is to determine if the t{calculated} exceeds this critical value.