VS

Notes on Error, Uncertainty, and Statistical Analysis

Types of Error

  • Systematic Error:

    • A flaw in equipment or design.

    • The error is reproducible, meaning it consistently affects measurements in the same way.

    • Example (Precision vs. Accuracy): Measurements like 12.7, 12.6, 12.8 for an expected value of 12 . These measurements are precise (close to each other) but not accurate (shifted from the true value), indicating a systematic error.

  • Random Error:

    • Caused by uncontrolled (and sometimes controllable) variables.

    • Has an equal chance of being positive or negative.

    • Example (Precision vs. Accuracy): Measurements like 8, 10, 14, 16 for an expected value of 12 . These measurements are accurate (their average is close to the true value) but not precise (spread out), indicating random error.

Uncertainty

  • Absolute Uncertainty:

    • The margin of uncertainty associated with a direct measurement.

    • Indicates the range within which the true value is expected to lie.

    • Notation: Can be expressed as 70 ext{°F}
      ightarrow 70 \pm 5 ext{°F} or 70(5) ext{°F} .

    • This means the certain value is between 65 ext{°F} and 75 ext{°F} .

  • Relative (Relative Percent) Uncertainty:

    • The size of the uncertainty with respect to the measurement itself.

    • Calculated as the ratio of absolute uncertainty to the measurement.

    • Formula: Relative uncertainty = \frac{\text{absolute uncertainty}}{\text{measurement}}

    • Example: For 70 \pm 5 ext{°F} , relative uncertainty = \frac{5}{70} \approx 0.07 .

    • Percent Uncertainty: Relative uncertainty multiplied by 100 .

    • Formula: Percent uncertainty = \text{relative uncertainty} \times 100 (or \frac{\text{absolute uncertainty}}{\text{measurement}} \times 100 )

    • Example: 0.07 \times 100 = 7\% .

Estimating Uncertainty

  • For Estimated Digits (e.g., from a graduated scale):

    • Convention 1: The estimated digit corresponds to \frac{1}{10} of the smallest mark on the measuring device.

      • Example: A scale with 1 ext{mL} marks, you might read 14.5 ext{mL} . The uncertainty would be \pm 0.1 ext{mL} (if the smallest mark is 1 ext{mL} and you estimate to the tenth, then a general estimate is 1/10 of the smallest division). This implies a range of 14.4 ext{mL} - 14.6 ext{mL} .

    • Convention 2: A common convention is to estimate uncertainty to \frac{1}{2} of the smallest mark.

      • Example: For a device with 1 ext{mL} marks, if you read 14 ext{mL} , the uncertainty could be \pm 0.5 ext{mL} . This means the true value is between 13.5 ext{mL} and 14.5 ext{mL} .

    • The choice of estimation method reflects your confidence in the measurement.

  • For Numerical Readouts (e.g., digital balances):

    • Uncertainty is assumed to be \pm 1 in the last displayed digit.

    • Example: A mass reading of 1.7346 ext{ g} has an uncertainty of \pm 0.0001 ext{ g} , meaning the actual mass is between 1.7345 ext{ g} and 1.7347 ext{ g} .

  • Propagation of Error: When performing calculations involving multiple measurements, the uncertainty associated with each measurement must be propagated through the calculation to determine the final uncertainty.

Statistics

  • Average (Mean - \bar{x} ):

    • The sum of all values divided by the number of measurements.

    • Formula: \bar{x} = \frac{\sum x_i}{n}

      • Where \sum x_i is the sum of individual measurements and n is the total number of measurements.

  • Standard Deviation (STDEV - s or \sigma ):

    • A measure of how closely the data cluster around the mean.

    • Population Standard Deviation ( \sigma or STDEV. P): Calculated when data from the entire population is available (rare in experimental settings).

      • Formula: \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}}

    • Sample Standard Deviation ( s or STDEV. S): Calculated from a sample of a larger population (most common in practical applications).

      • Formula: s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}}

      • The denominator (n-1) is used for samples because it provides a less biased estimate of the population standard deviation, tending to be larger than the population standard deviation formula to account for unknown values.

      • Example: A series of mass measurements (e.g., 2.7196, 2.7201, 2.7210, \dots ) can be summarized by their average and standard deviation (e.g., 2.7202 \pm 0.0007 ext{ g} ).

  • Confidence Intervals:

    • A range of values calculated from a sample that is likely to contain the true value of a population parameter.

    • Expressed as a probability that the population parameter will fall between a set of values.

    • Example: For a measurement 2.7202 \pm 0.0007 ext{ g} , the range is 2.7195 - 2.7209 ext{ g} .

    • Confidence Levels related to Standard Deviations (for normally distributed data):

      • One Standard Deviation ( \pm 1s ): Approximately 68\% of the population will fall within this range.

      • Two Standard Deviations ( \pm 2s ): Approximately 95\% of the population will fall within this range.

      • Three Standard Deviations ( \pm 3s ): Approximately 99.7\% of the population will fall within this range.

Comparison of Means (t-test)

  • Purpose: To determine if the averages from two data sets are statistically different from each other.

  • Prerequisites: This test assumes that the data is normally distributed and that the variances (standard deviations squared) of the two populations are either equal or known to be different.

  • Case 1: Standard Deviations (STDEV) are Assumed to be Equal

    • Formula for calculated t-value ( t{calculated} ): t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\text{STDEV} \sqrt{\frac{1}{n1} + \frac{1}{n_2}}} Where:

      • \bar{x}1 and \bar{x}2 are the averages of the two samples.

      • n1 and n2 are the number of measurements in each sample.

      • \text{STDEV} is the pooled standard deviation when standard deviations are considered equal. (Note: sometimes a simpler STDEV average is used directly if the STDEVs are very close.)

    • Example: Comparing the mass of two groups of frogs: ME ( 4.036 \pm 0.003 ext{ g} for 10 frogs) and MA ( 4.002 \pm 0.003 ext{ g} for 8 frogs).

      • Here, STDEV1 = STDEV2 = 0.003 ext{ g} . (The example provided in the transcript shows a calculation for this case, resulting in t_{calculated} \approx 23.82 ).

  • Case 2: Standard Deviations (STDEV) are Different

    • When the standard deviations of the two data sets are significantly different, a pooled standard deviation is calculated.

    • Formula for Pooled Standard Deviation ( STDEV{pooled} ):
      STDEV
      {pooled} = \sqrt{\frac{(\text{STDEV}1^2)(n1-1) + (\text{STDEV}2^2)(n2-1)}{n1 + n2 - 2}}

    • Formula for calculated t-value ( t{calculated} ) when STDEV is different:
      t
      {calculated} = \frac{|\bar{x}1 - \bar{x}2|}{STDEV{pooled} \sqrt{\frac{n1 + n2}{n1 n2}}} or equivalently t{calculated} = \frac{|\bar{x}1 - \bar{x}2|}{\sqrt{\frac{\text{STDEV}1^2}{n1} + \frac{\text{STDEV}2^2}{n2}}}

  • Decision Rule for t-test:

    • Compare the calculated t-value ( t{calculated} ) with a critical t-value from a Student's t-table ( t{table} ).

    • The t_{table} value is found using:

      • Degrees of Freedom (DOF): df = n1 + n2 - 2 .

      • Confidence Level: Typically 95\% (or 0.05 significance level for a two-tailed test).

    • Conclusion: If t{calculated} > t{table} , then the two results (averages) are statistically different at the specified confidence level.

Student's t-table (Excerpt from Harris. Quantitative Chemical Analysis. Eighth Edition)

This table provides critical t-values for various degrees of freedom and confidence levels.

Degrees of Freedom

Confidence Level (\%)

50

90

95

99.9

1

1.000

6.314

12.706

636.578

2

0.816

2.920

4.303

31.598

3

0.765

2.353

3.182

12.924

4

0.741

2.132

2.776

8.610

15

0.691

1.753

2.131

4.073

\infty

0.674

1.645

1.960

3.291

  • To use the table, locate the row corresponding to your degrees of freedom ( n1 + n2 - 2 ) and the column for your desired confidence level (commonly 95\% ). The intersecting value is t_{table} .

  • For example, with 16 degrees of freedom ( n1=10, n2=8
    ightarrow 10+8-2=16 ), at a 95\% confidence level, the t{table} would be between 2.131 (for df=15 ) and 2.086 (for df=20 ). A precise value for df=16 would be found in a full table. The purpose is to determine if the t{calculated} exceeds this critical value.