Exam 1 Prep: t-tests, Q-test, regression, calibration curves, and practical notes
Exam 1 Prep: t-tests, Q-test, regression, calibration curves, and practical notes
Exam format and policy
- Take-home exam, as approved by Garrett; sooner rather than later deadline.
- First question is a famous qualitative-quantitative problem known as the “man in the bat.”
- Exam structure: roughly the first 65 points are paragraph-style questions; the last 35 points mirror what would have been an in-class portion.
- Advice on AI: encouraged not to rely on AI; real examples were given from a prior course to illustrate potential pitfalls.
- Example warned about “10,000 Kelvin” excitation leading to misleading emissions; emphasizes the need to reason and type carefully rather than copy-paste.
- Deadline: October 2 (two weeks from the day of talk).
- Grading time for the instructor: about two hours to complete an answer key; plan ahead to avoid late-night stress.
- No class next Thursday due to the career fair; be mindful of lab atmosphere and appropriate attire around solvents.
- Lab context: the exam and lab practices emphasize careful data handling and proper statistical interpretation.
Confidence intervals and estimation
- Point estimate and interval: the general CI form is
- The smaller the standard deviation (s) and the larger the sample size (n), the closer the CI is to the true value (mu).
- Example intuition: distance to football stadium as a real-world analogue:
- Suppose the average distance is about .
- 50% confidence interval example mentioned: roughly from to miles.
- 95% confidence (gold standard) would be a symmetric interval around the mean with a wider span (specific numbers depend on data).
Hypothesis testing with the t-test (one-sample, known value)
- Concept: you can compare your measured value to a known value (mu0).
- Test statistic (one-sample t):
- Degrees of freedom: $df = n - 1$; compare to $t_{\alpha/2, df}$ from the t-table.
- Decision rule (95% level of significance, two-tailed): if $t{calc} > t{table}$, evidence suggests a difference; if $t{calc} < t{table}$, there is not sufficient evidence to indicate a difference (methods agree at the 5% level).
- Example from talk:
- Actual value = 5.05, average = 5.07, $s = 0.02$, $n = 3$.
- $t_{calc} = \frac{|5.05 - 5.07|}{0.02/\sqrt{3}} = 1.732$.
- For df = 2, $t{table} \approx 4.303$; since $t{table} > t_{calc}$, the methods agree at 95% confidence.
- Interpretation phrasing: at 95% level, there is not significant evidence to indicate a difference.
Comparing two data sets (two-sample t-test, assuming equal variances)
- When you have two samples with the same size (as in example: $\bar{x}1 = 5.03$, $s1 = 0.02$, $n1 = 3$ and $\bar{x}2 = 5.07$, $s2 = 0.02$, $n2 = 3$):
- Pooled standard deviation:
- With numbers: $s_p = 0.02$ (since both s-values are the same here).
- Test statistic:
- Degrees of freedom: $df = n1 + n2 - 2$ (here, $df = 4$).
- In the example, $t{calc} = 2.449$; compare to $t{table}$ for $df=4$ (two-tailed 5% ~ 2.776).
- Since $t{calc} < t{table}$, there is not significant evidence to conclude a difference at 5% significance (the two samples are not significantly different).
Comparing two methods or paired data (t-test on differences / paired t-test)
- Approach: collect paired data from two methods, compute differences $di = yi - x_i$, then perform one-sample t-test on the differences.
- Example data (differences): average difference $\bar{d} = 0.025$ and standard deviation of differences $s_d = 0.01$, $n = 4$.
- Test statistic:
- With numbers: $t_{calc} = \frac{0.025}{0.01/\sqrt{4}} = 5.0$.
- Degrees of freedom: $df = n - 1 = 3$. Critical value: $t_{table} \approx 3.182$ at 5% two-tailed.
- Since $t{calc} > t{table}$, reject null: the methods disagree at 5% level.
Z-test (brief mention) and Q-test (outlier test)
- Z-test: mentioned as another test, not elaborated in depth in transcript; intended for scenarios where population standard deviation is known.
- Q-test (outlier test) purpose: identify and possibly discard a suspect data point when there is doubt about its validity.
- How Q-test works:
- Define the gap: the difference between the suspect value and its nearest neighbor.
- Define the range: difference between the maximum and minimum values in the data set.
- Q value:
- Compare to a Q-table to decide whether to discard the outlier (data point deemed suspect).
- Worked example:
- Data: 20.63, 21.40, 14.21, 20.79, 21.02, 20.37; suspect value is 14.21.
- Gap = |14.21 - 20.37| or distance to nearest neighbor (depending on table convention); Range = 21.40 - 14.21 = 7.19 (example).
- Calculated Q ≈ 0.857; refer to Q-table to decide discard based on sample size and confidence level.
- If a trial is discarded via Q-test, you would report the data after discarding and recalculate statistics.
Practical example: Paralloid B72 and delamination study
- Context: Paralloid B72 polymer coating used for optical quality; under sunlight, delamination can occur.
- Experimental setup: copper coupons coated with Paralloid B72 analyzed with a Raman spectrometer (Raman) under varying temperatures (room temperature, 10°C, 20°C, 140°C, etc.).
- Observation: as temperature increases, peaks indicating delamination become more prominent, indicating polymer mobility and loosening from the surface.
- Takeaway: temperature-dependent spectral changes can reveal coating stability and delamination behavior.
Linear regression and calibration curves (squares and regression)
- What you need it for: generating calibration curves and fitting lines to quantify concentration or amount from measured signal.
- Key outputs:
- Intercept and slope of the best-fit line.
- Intercept formula (conceptual): when you fit a line to data, the intercept corresponds to where the line crosses the y-axis; algebraically this is often expressed as
- Slope (calculation):
- Note: most calculators or software (Excel, SigmaPlot) will compute both slope and intercept from a set of (x, y) data points.
- Practical workflow:
- Collect calibration data (known x, measured y).
- Use software to compute regression line (slope and intercept).
- Use the line to determine unknown quantities from measured signals.
- Recommendation: become proficient with a tool (Excel, SigmaPlot, or calculator) to generate calibration curves quickly and accurately.
Additional practical notes and study tips
- The instructor emphasizes understanding over mechanical execution; practice interpreting t-test results and confidence levels, not just computing numbers.
- Expect to be tested on: one-sample t-test, two-sample t-test (pooled variance), paired t-test (differences), Q-test, regression/calibration basics, and the interpretation of confidence intervals.
- The lab portion is expected to be time-consuming (roughly six hours) but the exam itself is designed to be manageable with careful work.
- Real-world relevance: t-tests and regression underpin quality control, analytical chemistry data interpretation, and method validation.
Quick reference to core formulas (summary)
- Confidence interval for one mean:
- One-sample t-test:
- Two-sample t-test (equal variances, pooled):
- Paired t-test (differences):
- Q-test for outliers:
- Linear regression (slope and intercept):
Closing reminder
- The lab is open and students should prepare to work efficiently; collaboration with peers is common, but it’s essential to understand and justify every step, especially in hypothesis testing and regression analysis.