Exam 1 Prep: t-tests, Q-test, regression, calibration curves, and practical notes

Exam format and policy
- Take-home exam, as approved by Garrett; sooner rather than later deadline.
- First question is a famous qualitative-quantitative problem known as the “man in the bat.”
- Exam structure: roughly the first 65 points are paragraph-style questions; the last 35 points mirror what would have been an in-class portion.
- Advice on AI: encouraged not to rely on AI; real examples were given from a prior course to illustrate potential pitfalls.
- Example warned about “10,000 Kelvin” excitation leading to misleading emissions; emphasizes the need to reason and type carefully rather than copy-paste.
- Deadline: October 2 (two weeks from the day of talk).
- Grading time for the instructor: about two hours to complete an answer key; plan ahead to avoid late-night stress.
- No class next Thursday due to the career fair; be mindful of lab atmosphere and appropriate attire around solvents.
- Lab context: the exam and lab practices emphasize careful data handling and proper statistical interpretation.
Confidence intervals and estimation
- Point estimate and interval: the general CI form is
- $\mu = \bar{x} \pm t_{\alpha/2,\, n-1}\left(\frac{s}{\sqrt{n}}\right)$
- The smaller the standard deviation (s) and the larger the sample size (n), the closer the CI is to the true value (mu).
- Example intuition: distance to football stadium as a real-world analogue:
- Suppose the average distance is about $\bar{x} = 10.7\text{ miles}$ .
- 50% confidence interval example mentioned: roughly from $10.5$ to $11.2$ miles.
- 95% confidence (gold standard) would be a symmetric interval around the mean with a wider span (specific numbers depend on data).
Hypothesis testing with the t-test (one-sample, known value)
- Concept: you can compare your measured value to a known value (mu0).
- Test statistic (one-sample t):
- $t = \frac{\lvert \bar{x} - \mu_0 \rvert}{s/\sqrt{n}}$
- Degrees of freedom: $df = n - 1$; compare to $t_{\alpha/2, df}$ from the t-table.
- Decision rule (95% level of significance, two-tailed): if $t{calc} > t{table}$, evidence suggests a difference; if $t{calc} < t{table}$, there is not sufficient evidence to indicate a difference (methods agree at the 5% level).
- Example from talk:
- Actual value = 5.05, average = 5.07, $s = 0.02$, $n = 3$.
- $t_{calc} = \frac{|5.05 - 5.07|}{0.02/\sqrt{3}} = 1.732$.
- For df = 2, $t{table} \approx 4.303$; since $t{table} > t_{calc}$, the methods agree at 95% confidence.
- Interpretation phrasing: at 95% level, there is not significant evidence to indicate a difference.
Comparing two data sets (two-sample t-test, assuming equal variances)
- When you have two samples with the same size (as in example: $\bar{x}1 = 5.03$, $s1 = 0.02$, $n1 = 3$ and $\bar{x}2 = 5.07$, $s2 = 0.02$, $n2 = 3$):
- Pooled standard deviation:
- $sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{(n1 + n_2 - 2)}$
- With numbers: $s_p = 0.02$ (since both s-values are the same here).
- Test statistic:
- $t = \frac{|\bar{x}1 - \bar{x}2|}{sp\sqrt{\frac{1}{n1} + \frac{1}{n_2}}}$
- Degrees of freedom: $df = n1 + n2 - 2$ (here, $df = 4$).
- In the example, $t{calc} = 2.449$; compare to $t{table}$ for $df=4$ (two-tailed 5% ~ 2.776).
- Since $t{calc} < t{table}$, there is not significant evidence to conclude a difference at 5% significance (the two samples are not significantly different).
Comparing two methods or paired data (t-test on differences / paired t-test)
- Approach: collect paired data from two methods, compute differences $di = yi - x_i$, then perform one-sample t-test on the differences.
- Example data (differences): average difference $\bar{d} = 0.025$ and standard deviation of differences $s_d = 0.01$, $n = 4$.
- Test statistic:
- $t{calc} = \frac{\lvert \bar{d} \rvert}{sd / \sqrt{n}}$
- With numbers: $t_{calc} = \frac{0.025}{0.01/\sqrt{4}} = 5.0$.
- Degrees of freedom: $df = n - 1 = 3$. Critical value: $t_{table} \approx 3.182$ at 5% two-tailed.
- Since $t{calc} > t{table}$, reject null: the methods disagree at 5% level.
Z-test (brief mention) and Q-test (outlier test)
- Z-test: mentioned as another test, not elaborated in depth in transcript; intended for scenarios where population standard deviation is known.
- Q-test (outlier test) purpose: identify and possibly discard a suspect data point when there is doubt about its validity.
- How Q-test works:
- Define the gap: the difference between the suspect value and its nearest neighbor.
- Define the range: difference between the maximum and minimum values in the data set.
- Q value: $Q = \frac{\text{gap}}{\text{range}}$
- Compare to a Q-table to decide whether to discard the outlier (data point deemed suspect).
- Worked example:
- Data: 20.63, 21.40, 14.21, 20.79, 21.02, 20.37; suspect value is 14.21.
- Gap = |14.21 - 20.37| or distance to nearest neighbor (depending on table convention); Range = 21.40 - 14.21 = 7.19 (example).
- Calculated Q ≈ 0.857; refer to Q-table to decide discard based on sample size and confidence level.
- If a trial is discarded via Q-test, you would report the data after discarding and recalculate statistics.
Practical example: Paralloid B72 and delamination study
- Context: Paralloid B72 polymer coating used for optical quality; under sunlight, delamination can occur.
- Experimental setup: copper coupons coated with Paralloid B72 analyzed with a Raman spectrometer (Raman) under varying temperatures (room temperature, 10°C, 20°C, 140°C, etc.).
- Observation: as temperature increases, peaks indicating delamination become more prominent, indicating polymer mobility and loosening from the surface.
- Takeaway: temperature-dependent spectral changes can reveal coating stability and delamination behavior.
Linear regression and calibration curves (squares and regression)
- What you need it for: generating calibration curves and fitting lines to quantify concentration or amount from measured signal.
- Key outputs:
- Intercept and slope of the best-fit line.
- Intercept formula (conceptual): when you fit a line to data, the intercept corresponds to where the line crosses the y-axis; algebraically this is often expressed as
- $b0 = \bar{y} - b1 \bar{x}$
- Slope (calculation):
- $b1 = \frac{n \sum xi yi - (\sum xi)(\sum yi)}{n \sum xi^2 - (\sum x_i)^2}$
- Note: most calculators or software (Excel, SigmaPlot) will compute both slope and intercept from a set of (x, y) data points.
- Practical workflow:
- Collect calibration data (known x, measured y).
- Use software to compute regression line (slope and intercept).
- Use the line to determine unknown quantities from measured signals.
- Recommendation: become proficient with a tool (Excel, SigmaPlot, or calculator) to generate calibration curves quickly and accurately.
Additional practical notes and study tips
- The instructor emphasizes understanding over mechanical execution; practice interpreting t-test results and confidence levels, not just computing numbers.
- Expect to be tested on: one-sample t-test, two-sample t-test (pooled variance), paired t-test (differences), Q-test, regression/calibration basics, and the interpretation of confidence intervals.
- The lab portion is expected to be time-consuming (roughly six hours) but the exam itself is designed to be manageable with careful work.
- Real-world relevance: t-tests and regression underpin quality control, analytical chemistry data interpretation, and method validation.
Quick reference to core formulas (summary)
- Confidence interval for one mean:
- $\mu = \bar{x} \pm t_{\alpha/2,\, n-1}\left(\frac{s}{\sqrt{n}}\right)$
- One-sample t-test:
- $t = \frac{\lvert \bar{x} - \mu_0 \rvert}{s/\sqrt{n}}$
- Two-sample t-test (equal variances, pooled):
- $sp^2 = \frac{(n1-1)s1^2 + (n2-1)s2^2}{n1+n_2-2}$
- $t = \frac{\lvert \bar{x}1 - \bar{x}2 \rvert}{sp \sqrt{\frac{1}{n1} + \frac{1}{n_2}}}$
- Paired t-test (differences):
- $t = \frac{\lvert \bar{d} \rvert}{s_d / \sqrt{n}}$
- Q-test for outliers:
- $Q = \frac{\text{gap}}{\text{range}}$
- Linear regression (slope and intercept):
- $b1 = \frac{n \sum xi yi - (\sum xi)(\sum yi)}{n \sum xi^2 - (\sum x_i)^2}$
- $b0 = \bar{y} - b1 \bar{x}$
Closing reminder
- The lab is open and students should prepare to work efficiently; collaboration with peers is common, but it’s essential to understand and justify every step, especially in hypothesis testing and regression analysis.