Exam 1 Prep: t-tests, Q-test, regression, calibration curves, and practical notes

Exam 1 Prep: t-tests, Q-test, regression, calibration curves, and practical notes

  • Exam format and policy

    • Take-home exam, as approved by Garrett; sooner rather than later deadline.
    • First question is a famous qualitative-quantitative problem known as the “man in the bat.”
    • Exam structure: roughly the first 65 points are paragraph-style questions; the last 35 points mirror what would have been an in-class portion.
    • Advice on AI: encouraged not to rely on AI; real examples were given from a prior course to illustrate potential pitfalls.
    • Example warned about “10,000 Kelvin” excitation leading to misleading emissions; emphasizes the need to reason and type carefully rather than copy-paste.
    • Deadline: October 2 (two weeks from the day of talk).
    • Grading time for the instructor: about two hours to complete an answer key; plan ahead to avoid late-night stress.
    • No class next Thursday due to the career fair; be mindful of lab atmosphere and appropriate attire around solvents.
    • Lab context: the exam and lab practices emphasize careful data handling and proper statistical interpretation.
  • Confidence intervals and estimation

    • Point estimate and interval: the general CI form is
    • μ=xˉ±tα/2,n1(sn)\mu = \bar{x} \pm t_{\alpha/2,\, n-1}\left(\frac{s}{\sqrt{n}}\right)
    • The smaller the standard deviation (s) and the larger the sample size (n), the closer the CI is to the true value (mu).
    • Example intuition: distance to football stadium as a real-world analogue:
    • Suppose the average distance is about xˉ=10.7 miles\bar{x} = 10.7\text{ miles}.
    • 50% confidence interval example mentioned: roughly from 10.510.5 to 11.211.2 miles.
    • 95% confidence (gold standard) would be a symmetric interval around the mean with a wider span (specific numbers depend on data).
  • Hypothesis testing with the t-test (one-sample, known value)

    • Concept: you can compare your measured value to a known value (mu0).
    • Test statistic (one-sample t):
    • t=xˉμ0s/nt = \frac{\lvert \bar{x} - \mu_0 \rvert}{s/\sqrt{n}}
    • Degrees of freedom: $df = n - 1$; compare to $t_{\alpha/2, df}$ from the t-table.
    • Decision rule (95% level of significance, two-tailed): if $t{calc} > t{table}$, evidence suggests a difference; if $t{calc} < t{table}$, there is not sufficient evidence to indicate a difference (methods agree at the 5% level).
    • Example from talk:
    • Actual value = 5.05, average = 5.07, $s = 0.02$, $n = 3$.
    • $t_{calc} = \frac{|5.05 - 5.07|}{0.02/\sqrt{3}} = 1.732$.
    • For df = 2, $t{table} \approx 4.303$; since $t{table} > t_{calc}$, the methods agree at 95% confidence.
    • Interpretation phrasing: at 95% level, there is not significant evidence to indicate a difference.
  • Comparing two data sets (two-sample t-test, assuming equal variances)

    • When you have two samples with the same size (as in example: $\bar{x}1 = 5.03$, $s1 = 0.02$, $n1 = 3$ and $\bar{x}2 = 5.07$, $s2 = 0.02$, $n2 = 3$):
    • Pooled standard deviation:
    • s<em>p2=(n</em>11)s<em>12+(n</em>21)s<em>22(n</em>1+n22)s<em>p^2 = \frac{(n</em>1-1)s<em>1^2 + (n</em>2-1)s<em>2^2}{(n</em>1 + n_2 - 2)}
    • With numbers: $s_p = 0.02$ (since both s-values are the same here).
    • Test statistic:
    • t=xˉ<em>1xˉ</em>2s<em>p1n</em>1+1n2t = \frac{|\bar{x}<em>1 - \bar{x}</em>2|}{s<em>p\sqrt{\frac{1}{n</em>1} + \frac{1}{n_2}}}
    • Degrees of freedom: $df = n1 + n2 - 2$ (here, $df = 4$).
    • In the example, $t{calc} = 2.449$; compare to $t{table}$ for $df=4$ (two-tailed 5% ~ 2.776).
    • Since $t{calc} < t{table}$, there is not significant evidence to conclude a difference at 5% significance (the two samples are not significantly different).
  • Comparing two methods or paired data (t-test on differences / paired t-test)

    • Approach: collect paired data from two methods, compute differences $di = yi - x_i$, then perform one-sample t-test on the differences.
    • Example data (differences): average difference $\bar{d} = 0.025$ and standard deviation of differences $s_d = 0.01$, $n = 4$.
    • Test statistic:
    • t<em>calc=dˉs</em>d/nt<em>{calc} = \frac{\lvert \bar{d} \rvert}{s</em>d / \sqrt{n}}
    • With numbers: $t_{calc} = \frac{0.025}{0.01/\sqrt{4}} = 5.0$.
    • Degrees of freedom: $df = n - 1 = 3$. Critical value: $t_{table} \approx 3.182$ at 5% two-tailed.
    • Since $t{calc} > t{table}$, reject null: the methods disagree at 5% level.
  • Z-test (brief mention) and Q-test (outlier test)

    • Z-test: mentioned as another test, not elaborated in depth in transcript; intended for scenarios where population standard deviation is known.
    • Q-test (outlier test) purpose: identify and possibly discard a suspect data point when there is doubt about its validity.
    • How Q-test works:
    • Define the gap: the difference between the suspect value and its nearest neighbor.
    • Define the range: difference between the maximum and minimum values in the data set.
    • Q value: Q=gaprangeQ = \frac{\text{gap}}{\text{range}}
    • Compare to a Q-table to decide whether to discard the outlier (data point deemed suspect).
    • Worked example:
    • Data: 20.63, 21.40, 14.21, 20.79, 21.02, 20.37; suspect value is 14.21.
    • Gap = |14.21 - 20.37| or distance to nearest neighbor (depending on table convention); Range = 21.40 - 14.21 = 7.19 (example).
    • Calculated Q ≈ 0.857; refer to Q-table to decide discard based on sample size and confidence level.
    • If a trial is discarded via Q-test, you would report the data after discarding and recalculate statistics.
  • Practical example: Paralloid B72 and delamination study

    • Context: Paralloid B72 polymer coating used for optical quality; under sunlight, delamination can occur.
    • Experimental setup: copper coupons coated with Paralloid B72 analyzed with a Raman spectrometer (Raman) under varying temperatures (room temperature, 10°C, 20°C, 140°C, etc.).
    • Observation: as temperature increases, peaks indicating delamination become more prominent, indicating polymer mobility and loosening from the surface.
    • Takeaway: temperature-dependent spectral changes can reveal coating stability and delamination behavior.
  • Linear regression and calibration curves (squares and regression)

    • What you need it for: generating calibration curves and fitting lines to quantify concentration or amount from measured signal.
    • Key outputs:
    • Intercept and slope of the best-fit line.
    • Intercept formula (conceptual): when you fit a line to data, the intercept corresponds to where the line crosses the y-axis; algebraically this is often expressed as
    • b<em>0=yˉb</em>1xˉb<em>0 = \bar{y} - b</em>1 \bar{x}
    • Slope (calculation):
    • b<em>1=nx</em>iy<em>i(x</em>i)(y<em>i)nx</em>i2(xi)2b<em>1 = \frac{n \sum x</em>i y<em>i - (\sum x</em>i)(\sum y<em>i)}{n \sum x</em>i^2 - (\sum x_i)^2}
    • Note: most calculators or software (Excel, SigmaPlot) will compute both slope and intercept from a set of (x, y) data points.
    • Practical workflow:
    • Collect calibration data (known x, measured y).
    • Use software to compute regression line (slope and intercept).
    • Use the line to determine unknown quantities from measured signals.
    • Recommendation: become proficient with a tool (Excel, SigmaPlot, or calculator) to generate calibration curves quickly and accurately.
  • Additional practical notes and study tips

    • The instructor emphasizes understanding over mechanical execution; practice interpreting t-test results and confidence levels, not just computing numbers.
    • Expect to be tested on: one-sample t-test, two-sample t-test (pooled variance), paired t-test (differences), Q-test, regression/calibration basics, and the interpretation of confidence intervals.
    • The lab portion is expected to be time-consuming (roughly six hours) but the exam itself is designed to be manageable with careful work.
    • Real-world relevance: t-tests and regression underpin quality control, analytical chemistry data interpretation, and method validation.
  • Quick reference to core formulas (summary)

    • Confidence interval for one mean:
    • μ=xˉ±tα/2,n1(sn)\mu = \bar{x} \pm t_{\alpha/2,\, n-1}\left(\frac{s}{\sqrt{n}}\right)
    • One-sample t-test:
    • t=xˉμ0s/nt = \frac{\lvert \bar{x} - \mu_0 \rvert}{s/\sqrt{n}}
    • Two-sample t-test (equal variances, pooled):
    • s<em>p2=(n</em>11)s<em>12+(n</em>21)s<em>22n</em>1+n22s<em>p^2 = \frac{(n</em>1-1)s<em>1^2 + (n</em>2-1)s<em>2^2}{n</em>1+n_2-2}
    • t=xˉ<em>1xˉ</em>2s<em>p1n</em>1+1n2t = \frac{\lvert \bar{x}<em>1 - \bar{x}</em>2 \rvert}{s<em>p \sqrt{\frac{1}{n</em>1} + \frac{1}{n_2}}}
    • Paired t-test (differences):
    • t=dˉsd/nt = \frac{\lvert \bar{d} \rvert}{s_d / \sqrt{n}}
    • Q-test for outliers:
    • Q=gaprangeQ = \frac{\text{gap}}{\text{range}}
    • Linear regression (slope and intercept):
    • b<em>1=nx</em>iy<em>i(x</em>i)(y<em>i)nx</em>i2(xi)2b<em>1 = \frac{n \sum x</em>i y<em>i - (\sum x</em>i)(\sum y<em>i)}{n \sum x</em>i^2 - (\sum x_i)^2}
    • b<em>0=yˉb</em>1xˉb<em>0 = \bar{y} - b</em>1 \bar{x}
  • Closing reminder

    • The lab is open and students should prepare to work efficiently; collaboration with peers is common, but it’s essential to understand and justify every step, especially in hypothesis testing and regression analysis.