Full study

Chapter 9: Inferential Statistics – Making Sense of Significance, Confidence, and Inference

Overview: In social/policy research, data often come from random samples intended to represent a larger population; the goal is to make inferences about population parameters from sample estimates. The three basic components of statistical inference are:
- Point estimates: the sample statistic as the best guess of the population parameter.
- Precision: how close the estimate is likely to be to the true parameter (described by standard errors and confidence intervals).
- Significance tests (hypothesis tests): whether observed differences/relationships are real or likely due to sampling fluctuation.

The Sampling Distribution: Foundation for inference
- Imagine repeating the same sampling procedure many times and collecting the distribution of the resulting estimates (a sampling distribution).
- With enough repetitions (e.g., 100–1,000 samples), the sampling distribution tends toward a normal shape; its mean centers at the population parameter P.
- The center of the sampling distribution equals the population parameter (P) and the spread is governed by the standard error.
- The normal shape makes inference tractable, since the distribution is defined by its mean and standard deviation.

The Standard Error (SE)
- SE measures the typical distance between a sample statistic and the population parameter due to sampling variability.
- For a proportion p (sample proportion) with population proportion P and sample size n:
  $SE(p) = \sqrt{ \frac{P(1-P)}{n} }$
- In practice, P is unknown; we substitute the sample proportion p for P to compute SE.
- Example: If P = 0.05 and n = 400, the SE for a proportion is $SE = \sqrt{ (0.05 \times 0.95)/400 } = 0.011 \,(1.1\%)$
- For a mean, SE is $SE(\bar{x}) = \frac{S}{\sqrt{n}}$ where S is the population standard deviation (substitute s, the sample SD, when S is unknown).

The Empirical Rule (for a Normal Sampling Distribution)
- 68% of the distribution within $\pm1$ SE
- 95% within $\pm2$ SE (more precisely 1.96 SE)
- 99.7% within $\pm3$ SE
- These rules underpin construction of approximate confidence intervals (CIs).

Confidence Intervals (CIs)
- A CI provides a range in which the true parameter is likely to lie, given the sample and a chosen confidence level.
- General form: estimate $\pm$ Z* $\times$ SE, where Z* is the precise number of SEs corresponding to the desired confidence level.
- Common values:
- 95% CI: use Z* = 1.96 (exact value is 1.96; exact 95% CI uses this in large samples; in small samples, t-distribution is preferred).
- 90% CI: Z* = 1.65
- 99% CI: Z* = 2.58
- For a 95% CI for a proportion using the basic empirical rule, you can approximate with: $\hat{p} \pm 2\times SE(p)$ , but exact calculation uses Z* = 1.96.
- Important caveat: Confidence intervals reflect only sampling error; they do not account for other sources of error (measurement error, coverage error, nonresponse, data processing errors, causal inference error, etc.).

Confidence Intervals for Proportions: Worked example
- Example: A hospital patient-satisfaction survey with n = 100 and p̂ = 0.67 (67% satisfied).
- SE(p̂) = $\sqrt{ \hat{p}(1-\hat{p})/n } = \sqrt{0.67\times0.33/100} \approx 0.047$ .
- 95% CI: $0.67 \pm 1.96\times 0.047 \approx 0.67 \pm 0.092 \Rightarrow [0.58, 0.76].$
- Interpretation: We are 95% confident that the true satisfaction rate lies between 58% and 76%.
- Substitution principle: The population parameter P is unknown; we replace it with p̂ in SE calculations.

Confidence Intervals for Means
- For a mean, use SE = $s/\sqrt{n}$ (sample SD s) if the population SD S is unknown.
- 95% CI for the mean: $\bar{x} \pm 1.96\times \frac{s}{\sqrt{n}}$ (using the t distribution is more accurate for small samples: use t_{df} instead of 1.96; with large samples, t $\approx$ Z).
- Example: If sample mean wait time is 8.3 hours, s = 5.6, n = 100:
- SE = $5.6/\sqrt{100} = 0.56$
- 95% CI using Z: $8.3 \pm 1.96\times 0.56 \approx 8.3 \pm 1.1 \Rightarrow [7.2, 9.4]$ hours.
- With the t distribution, the critical value would be slightly larger ( $\approx1.98$ for 95% in small samples), yielding a similar but slightly wider CI.

Power, Significance, and Hypothesis Testing (Inference about Population Parameters)
- Significance tests (hypothesis tests) assess whether a difference or relationship is real (not a fluke of sampling).
- Key components:
- Null hypothesis (H0): typically a statement of no difference/no effect (e.g., no difference between groups, slope = 0).
- Alternative hypothesis (H1 or Ha): the statement being tested (e.g., there is a difference, slope $\ne$ 0).
- p-value: the probability, under H0, of observing results as extreme as or more extreme than those observed.
- Decision rule (conventional): reject H0 if p-value < chosen significance level (alpha), commonly 0.05, but other levels (0.10, 0.01) are also used depending on context.
- Important nuance: a p-value is not the probability that H0 is true; it is the probability of obtaining the observed data (or more extreme) given that H0 is true.
- Significance levels and practical significance can diverge: a result can be statistically significant but with a trivially small effect size (practical significance).
- Example (t-test for difference in means): If observed difference is -1.072 with SE = 0.620, then
 $t = \frac{-1.072 - 0}{0.620} = -1.729$
 and the p-value is about 0.084, which may be considered not significant at the 5% level but could be at 10% depending on the context.

Interpreting and Interpreting Significance: Practical vs Statistical Significance
- Box 9.1 outlines sources of statistical significance and statistical insignificance (e.g., large differences or small SEs vs large SEs or tiny differences).
- It is possible to have statistically significant results that are not practically significant (and vice versa).
- Publication bias toward statistically significant results is a concern; consider robustness and practical relevance, not just p-values.
- When many tests are performed, multiple comparison corrections (e.g., Bonferroni, Scheffé) may be necessary to control the overall error rate.

Hypothesis Testing in Regression and Related Tests
- Significance testing in regression typically uses a t-test for the slope (or intercept) to assess whether a relationship is present.
- Example: NELS data show a slope for hours of homework on test scores; t = 2.45 with p = 0.016 indicates a statistically significant relationship at conventional levels.
- For regression, the phrase “Population slope = $\beta$ = 0” is the null; the alternative is $\beta \ne$ 0.
- For relationships between categorical variables, chi-square tests examine whether there is an association; the null is no relationship; the alternative is there is a relationship.
- p-values for different tests (t, F, chi-square) are interpreted similarly: small p-values imply rejection of the null; large p-values imply insufficient evidence to reject the null.

Practical Topics in Inference
- Power and Type II errors: power = 1 \minus P(Type II error); power increases with larger samples and stronger effects; larger alpha decreases power.
- Minimal Detectable Effect (MDE): the smallest effect size that a study has power to detect at a given alpha and sample size.
- Lehr’s equation for planning sample size in comparing two means: $n = \frac{16}{\Delta^2}$ where $\Delta$ is the standardized difference in means (effect size).
- Cohen’s conventions for effect sizes: small $\sim$ 0.2, medium $\sim$ 0.5, large $\sim$ 0.8 (in terms of standardized mean difference).

Precision with Complex Sampling and Nonprobability Samples
- Complex sampling (clustering, stratification, oversampling) requires design effects to adjust standard errors; software can often adjust SEs accordingly, or use design effects.
- Nonprobability samples (convenience samples, voluntary samples) challenge the basis of inference; superpopulation concepts and model-based inference are used in some cases, but interpretations must be cautious.
- Bootstrapping provides an alternative inference approach when standard error formulas are hard to obtain or when assumptions are dubious.

Bayesian vs Frequentist Inference
- Frequentist inference treats probability as long-run frequency and relies on sampling distributions and p-values without prior information.
- Bayesian inference starts with prior probabilities and updates them with data to form posterior probabilities; priors can be subjective but provide a coherent framework for updating beliefs.
- In practice, most applied work uses frequentist methods, though Bayesian intuition influences how researchers interpret results.

Summary of Chapter 9 Takeaways
- Confidence intervals quantify precision; they reflect sampling error but not all sources of error.
- Significance tests help determine whether observed patterns are unlikely under the null, but p-values do not convey practical importance.
- Larger samples reduce SE and can render small effects statistically significant; always consider practical significance and power.
- When multiple tests are performed, consider corrections to control for type I error inflation.

Chapter 10: Multivariate Statistics – Making Sense of Multiple Variables

What multivariate statistics is about
- Real-world phenomena involve many variables; multivariate methods help analyze multiple independent and dependent variables simultaneously.
- The centerpiece in many applied settings is multiple regression, which predicts a dependent variable y from several independent variables x1, x2, …, xk.
- Core equation: $y = a + \beta1 x1 + \beta2 x2 + \cdots + \beta k xk$
- Interpretation:
- The constant a is the predicted value of y when all x's equal 0 (often of limited substantive meaning).
- Each coefficient $\beta j$ is the predicted change in y for a one-unit increase in $xj$ , holding all other x’s constant (the key advantage over simple regression).
- R-squared ( $R^2$ ) is the proportion of variation in y explained by all independent variables together.
- Adjusted R-squared adjusts $R^2$ for the number of predictors, giving a less biased estimate of explained variance when comparing models with different numbers of predictors.
- Example from the text: earnings predicted by education (x1) and experience (x2) with
- $R^2$ = 0.57
- Intercept a = $- $6,739</li><li>$ \beta1 (education) = $3,292 per year
- \beta2 (experience) = $415 per year
- Interpretation of the example: with 4 more years of education, holding experience constant, earnings rise by 4 \times 3292 = 13,168. $(as a simple illustration; see Box 10.2 for steps to predict)</li></ul></li></ul><ul><li>Practical use: prediction and causal inference<ul><li>Multiple regression is used for prediction and for estimating causal effects, controlling for other variables (statistical control).</li><li>Out-of-sample extrapolation cautions: avoid extrapolating beyond the range of the data unless you have robustness checks and plausible justification.</li><li>Predictions with regression come with prediction intervals that incorporate both model uncertainty and residual variance.</li></ul></li></ul><ul><li>Box 10.1: How to Run a Multiple Regression Using Software<ul><li>Steps: input data, specify dependent variable and independent variables, run regression, inspect coefficients, SEs, t-tests, R-squared, and other outputs.</li><li>Regression can be done in software packages (SAS, SPSS, Stata, etc.) or in spreadsheets, though scripts are preferred for reproducibility.</li></ul></li></ul><ul><li>Interpreting Regression Outputs<ul><li>Coefficients ($ \beta $s): reflect the partial effect of each predictor when others are held constant.</li><li>Prediction: use the equation to predict y for given x-values (Box 10.2).</li><li>$ R^2 $vs Adjusted$ R^2 $:$ R^2 $increases with adding predictors; Adjusted$ R^2 $compensates for the number of predictors, penalizing unnecessary ones.</li><li>Important caution:$ R^2 $measures statistical explainable variance, not causal explanation.</li><li>Venn-diagram intuition (Kennedy, 2003): in multiple regression, the regression coefficient for a predictor corresponds to the overlap with the outcome that is not shared with other predictors.</li></ul></li></ul><ul><li>Multicollinearity<ul><li>Definition: Occurs when two or more independent variables in a regression model are highly correlated with each other.</li><li>Perfect multicollinearity: when two predictors are perfectly correlated; you cannot estimate separate effects for both; one will be dropped by software.</li><li>High (non-perfect) multicollinearity: inflates SEs, reduces precision, and makes coefficient estimates unstable; adding predictors can worsen or improve precision depending on how they relate to the dependent variable and other predictors.</li><li>Practical rule-of-thumb: more data helps; a common guideline is about one predictor per 10 observations, but this depends on multicollinearity and the desired precision.</li></ul></li></ul><ul><li>Standardized Coefficients (Beta Weights)<ul><li>Standardized coefficients express effects in terms of standard deviation units, enabling comparison of the relative importance of predictors.</li><li>In simple regression, the standardized coefficient equals the correlation r; in multiple regression, interpretation is the change in y (in SDs) per one SD increase in x, holding other variables constant.</li></ul></li></ul><ul><li>Inference for Regression Coefficients<ul><li>Coefficients have standard errors; significance tests (t-tests) determine whether a predictor has a statistically discernible effect.</li><li>Example: A regression with homework hours and TV watching hours showing a t-statistic for homework of t = 2.45 with p = 0.016 indicates a statistically significant effect at conventional levels.</li><li>The p-value here refers to the hypothesis that the corresponding coefficient equals zero (no effect) in the population.</li><li>A nonzero coefficient with a small p-value supports the existence of an effect; a large p-value suggests insufficient evidence to claim an effect.</li></ul></li></ul><ul><li>Categorical Independent Variables in Regression<ul><li>Dummy coding converts categorical variables into numeric indicators (0/1) so they can be included in regression.</li><li>Reference category vs. comparison categories: the omitted category serves as the reference; coefficients for the included dummies represent differences relative to the reference group.</li><li>Example: Female dummy (Female = 1 if female, 0 if male) yields a coefficient showing the average earnings difference between women and men, holding other variables constant.</li><li>If a Male dummy were used instead, its coefficient would be the opposite sign of the Female dummy.</li><li>When a categorical variable has more than two categories, create (categories$ \minus $1) dummy variables; the omitted category is the reference.</li><li>Box 10.3 discusses diabetes with three categories (No diabetes, Type 1, Type 2) and how to set up dummies.</li></ul></li></ul><ul><li>Interactions in Regression<ul><li>Definition: A term created by multiplying two or more independent variables together. It allows for testing whether the effect of one variable on the dependent variable depends on the level of another variable.</li><li>Example form: Exp = a +$ \beta{diab} $Diabetes +$ \beta{hyper} $Hyper +$ \beta_{int} $Inter, where Inter = Diabetes$ \times $Hyper.</li><li>Purpose: Used to test for conditional effects and to model more realistic relationships where the impact of one factor is not constant but varies depending on another factor.</li><li>Interpretation becomes more complex: you interpret simple main effects (the effect of one variable at a particular level of the other) and the interaction coefficient shows how the effect of one variable changes with the other.</li><li>Always include both main effects when modeling interactions to avoid biased estimates.</li><li>Interactions with quantitative variables (e.g., Age$ \times Diabetes) similarly modify the slope for the predictor depending on the level of the interacting variable.

Nonlinearity and Transformations
- Not all relationships are linear; nonlinear patterns (e.g., U-shaped) can be modeled with polynomial terms (e.g., Experience and Experience$^2$) or by transforming the dependent/independent variables (log, exponential, etc.).
- When a quadratic term is included, interpret the two coefficients jointly to understand the marginal effect at different levels of the predictor.
- Example: Earnings = a + \beta{exp} $Experience +$ \beta{exp2} Experience$^2$; a negative \beta_{exp2} $implies diminishing returns to experience.</li><li>The log transformation is common for skewed variables (e.g., earnings, expenditures) and leads to elasticity interpretations: if the dependent variable is log-transformed, a one-percent change in an independent variable corresponds to a percentage change in the dependent variable.</li></ul></li></ul><ul><li>Categorical Dependent Variables and Related Models<ul><li>Linear Probability Model (LPM): regression with a binary dependent variable; predicted values are probabilities; easy interpretation but can yield predicted probabilities outside [0,1] and assumes homoscedastic errors.</li><li>Logistic regression (logit) and Probit regression: modeling a binary outcome with log-odds (logit) orprobability (probit); coefficients are not directly interpretable as changes in y, but can be translated into odds ratios (exp(coefficient)) in logit models; pseudo R-squared provides a rough measure of fit.</li><li>Marginal effects: the change in predicted probability for a small change in an independent variable (standard in logit/probit contexts).</li><li>Multinomial and ordered logit/probit: extensions for dependent variables with more than two categories; important assumptions include independence of irrelevant alternatives (IIA) for some multinomial models (which can be problematic); alternative specifications (nested logit, etc.) may be used.</li></ul></li></ul><ul><li>Other Multivariate Methods (Overview)<ul><li>Path analysis: estimates a system of regressions to test causal structures; distinguishes direct and indirect effects via path diagrams.</li><li>Factor analysis: data reduction technique that groups observed variables into latent factors; helps address multicollinearity by reducing dimensionality.</li><li>Confirmatory Factor Analysis (CFA) and Structural Equation Modeling (SEM): CFA tests a predefined factor structure; SEM combines measurement (factor) models with structural (causal) paths; SEM allows estimation of latent variables and their relations.</li><li>Multilevel (hierarchical) models: analyze data with nested structures (e.g., students within classrooms within schools); model variance at multiple levels and cross-level interactions.</li><li>Time series and forecasting: handle temporal dependence, trends, seasonality, autocorrelation; special considerations include lagged variables and growth rates.</li><li>Spatial analysis and GIS: analyze spatial patterns and spatial autocorrelation; clusters or hot spots may be identified.</li><li>Limited dependent variables and survival analysis: Tobit, Poisson (count data), censored data, top-coding; survival analysis handles time-to-event data with censoring.</li></ul></li></ul><ul><li>Practical Tips for Multivariate Analysis (Box 10.5)<ul><li>Start with descriptive statistics and univariate/bivariate analyses before multivariate models.</li><li>Plan the multivariate analysis in advance and specify hypotheses/modes of analysis.</li><li>Define independent and dependent variables clearly and ensure proper variable types (quantitative, ordinal categorical, nominal categorical).</li><li>Decide reference categories for dummies to facilitate interpretation; consider effect coding as an alternative.</li><li>Check for multicollinearity with correlations and regression results; consider dropping variables if necessary.</li><li>Consider transformations (log, square, etc.) to address nonlinearity or heteroskedasticity.</li><li>Assess model fit and leverage robustness checks, including alternative specifications and potential out-of-sample validation.</li><li>When using complex survey data, ensure software accounts for design effects and weighting; otherwise, apply design-adjusted standard errors.</li><li>Use confidence intervals in addition to significance tests to communicate uncertainty and practical significance.</li></ul></li></ul><ul><li>Pathways for Causation and Causal Modeling<ul><li>Regression can provide estimates consistent with causal interpretation only under strong assumptions; Chapter 12–13 (not covered here) discuss causal inference in depth.</li><li>Structural equation modeling (SEM) and path analysis provide a framework to model causal structures and latent variables, while acknowledging model assumptions.</li></ul></li></ul><ul><li>Common Regression Topics, Terms, and Notation (Glossary-ish recap)<ul><li>Regression: predicting a dependent variable from one or more independent variables.</li><li>Coefficient ($ \beta j $): the effect of a one-unit increase in$ xj $on y, holding other variables constant.</li><li>R-squared ($ R^2 $): proportion of variance in y explained by the model.</li><li>Adjusted R-squared: adjusted for the number of predictors; helps compare models with different numbers of predictors.</li><li>Multicollinearity: high correlation among independent variables, inflating standard errors and making estimates unstable.</li><li>Dummy variable: 0/1 indicator for a category; coefficient measures difference from the reference category.</li><li>Interaction: product of two variables used to test whether the effect of one variable depends on another.</li><li>Nonlinear transformation: using polynomials or log/arbitrary transformations to capture nonlinear relationships.</li><li>LPM, Logit, Probit: regression methods for binary dependent variables; interpretation differs across models.</li><li>Marginal effects: changes in predicted probabilities with changes in independent variables (especially in logit/probit).</li><li>CFA/SEM: confirmatory factor models and combined measurement-structural modeling; latent variables and observed indicators.</li><li>Multilevel models: handle data with hierarchical structure; cross-level effects and random effects.</li><li>Time series/Forecasting: trends, seasonality, autocorrelation; forecasting involves extrapolation with uncertainty.</li><li>Spatial analysis: spatial autocorrelation and clustering considerations.</li><li>Limited dependent variables and survival analysis: models for censored or count data; Tobit, Poisson, and survival models.</li></ul></li></ul><ul><li>Key Formulas (quick reference)<ul><li>Simple regression form:$ y = a + \beta1 x1 + \beta2 x2 + \cdots + \beta k xk $</li><li>Proportion standard error:$ SE(p) = \sqrt{\frac{P(1-P)}{n}}\quad\text{or with }p\text{ as estimate: }P\approx p $</li><li>Mean standard error:$ SE(\bar{x}) = \frac{S}{\sqrt{n}} $</li><li>Confidence interval for a proportion:$ \hat{p} \pm Z^*\times SE(p) $</li><li>Confidence interval for a mean:$ \bar{x} \pm Z^*\times SE(\bar{x})\quad(\text{or } t_{df} \text{ for small samples}) $</li><li>Margin of error (approx):$ MOE \approx 2\times SE $for 95% confidence</li><li>Test statistic for mean difference:$ t = \frac{\text{Estimate} - \text{Null}}{SE} $</li><li>p-value interpretation: the probability of observing the sample result given the null hypothesis is true</li><li>Chi-square test:$ \chi^2 = \sum \frac{(\text{Observed} - \text{Expected})^2}{\text{Expected}} $</li><li>Regression t-statistics: t = b / SE(b); p-values correspond to the null that the slope equals 0</li><li>Power:$ \text{Power} = 1 - \Pr(\text{Type II error}) $</li><li>Lehr’s sample size for comparing two means:$ n = \frac{16}{\Delta^2},\quad \Delta = \frac{\text{Difference in means}}{\sigma} $</li><li>Multivariate relationships (conceptual):$ R^2 $, Adjusted$ R^2 $, and the overlap interpretation in path diagrams</li><li>Interaction example: Inter = Diab$ \times $Hyper; predicted outcomes depend on combined levels of interacting variables</li><li>Logging elasticity: if both Y and X are logged, coefficient represents elasticity (percent change in Y for a 1% change in X)</li></ul></li></ul><ul><li>Concluding Notes<ul><li>The toolbox of inferential methods (CI, hypothesis tests, power analysis, and multivariate models) is essential for understanding and communicating uncertainty in social research.</li><li>Always consider whether a statistically significant result also carries practical significance; examine effect sizes, confidence intervals, and robustness of findings.</li><li>Be mindful of assumptions, data structure (e.g., clustering, weights), and the potential need for alternative methods (bootstrapping, robust SEs, or Bayesian approaches) depending on data and context.</li></ul></li></ul><h6>Causation, Correlation, and Causal Inference in Social Policy</h6><ul><li>Core problem: Correlation does not imply causation. A negative correlation between family dinner frequency and teen drug/alcohol use was observed in a national survey, but this alone cannot establish that more family dinners cause less substance use.</li><li>Public health response: Movement like Family Day emerged from the belief that increasing family time could reduce teen substance use (Window: Box 11.1).</li><li>Key idea: Distinguish independent variable (IV) and dependent variable (DV)<ul><li>In the CASA example, frequency of family dinners is the IV; drug/alcohol use is the DV.</li><li>The observed negative association could be causal, but alternative explanations are plausible.</li></ul></li></ul><h6>Box 11.1: The CASA Finding (Summary)</h6><ul><li>From 2003–2008 CASA findings: teens with five or more family dinners per week are less likely to use marijuana, tobacco, and alcohol than teens with fewer family dinners.</li><li>Quantified risk differences (average over 2003–2008):<ul><li>Infrequent dinners (<3 per week) vs frequent dinners ($ \ge $5 per week):</li><li>Marijuana: about 2.5x more likely to have used</li><li>Tobacco: about 2.5x more likely to have used</li><li>Alcohol: about 1.5x more likely to have drunk</li></ul></li></ul><ul><li>Box also notes the annual Family Day celebration and the emphasis on parental engagement as a preventive tool.</li><li>Important caveat: The reported magnitudes assume the observed difference is causal; they do not account for possible confounding factors (common causes) or reverse causation.</li></ul><h6>Correlation Is Not Causation: Why the caution matters</h6><ul><li>The correlation observed between family dinners and drug use could arise from several causal and noncausal processes:<ul><li>Causation: frequent family dinners reduce drug use via improved communication, monitoring, or other mechanisms.</li><li>Reverse causation: teens who use drugs may avoid dinner times, leading to fewer family dinners. Reverse causation, also known as simultaneity bias, occurs when the outcome appears to occur before the exposure (it's unclear if X is affecting Y or vice versa).</li><li>Common causes (confounding): a maturity level or family environment factor influences both dinner frequency and drug use (e.g., more mature teens may both have more dinners with parents and make healthier choices).</li><li>Spurious Correlation: Two variables appear causally related due to a third variable that is unaccounted for.</li><li>Complex/unknown common causes: additional unmeasured factors that drive both IV and DV.</li><li>Simultaneity/bidirectional causation: both directions could operate; e.g., drug use reduces family dinners while dinners reduce drug use.</li></ul></li></ul><ul><li>The chapter frames these possibilities with path diagrams (Figure 11.1, 11.2, 11.3) and introduces terms like spurious correlation and confounding.</li></ul><h6>Possible Explanations of a Correlation (Models)</h6><ul><li>Causation (IV$ \rightarrow $DV): Family dinners cause lower drug use (Figure 11.1, upper path).</li><li>Reverse Causation (DV$ \rightarrow $IV): Drug use leads to fewer family dinners (Figure 11.1, lower path).</li><li>Common Cause (Third variable, C): Maturity or other factors cause both more family dinners and lower drug use (Figure 11.2).<ul><li>If C is a true antecedent (causes both IV and DV), the correlation is spurious with respect to the causal effect of IV on DV.</li></ul></li></ul><ul><li>Mixed causal effect with bias from a common cause: The true effect of family dinners on drug use exists but may be biased upward or downward if the maturity factor is not controlled (Figure 11.3).</li><li>Unknown/complex common causes: Bias can arise even when the common cause is not fully understood (Figure 11.4).</li><li>Simultaneity bias: Both causation and reverse causation occur simultaneously (Figure 11.5).</li></ul><h6>The Magnitude Question: Why size matters and how bias arises</h6><ul><li>A simple magnitude example: If marijuana use dropped from 27% to 11% when families changed their dinner habits, that’s a 16 percentage-point decline for the population in the scenario. In a population of 1,000,000 families, this implies about 160,000 fewer teens using marijuana.</li><li>The problem: This estimated effect’s size may be biased if common causes are not accounted for (e.g., teens’ maturity, parental work hours, divorce, etc.).</li><li>Bias (in a causal estimate): A distortion in the strength or direction of a causal estimate due to unaccounted-for confounding factors (common causes, reverse causation, etc.), or influenced by confounding variables or common causes. An example is the correlation between shoe size and reading ability (both increase with age).</li><li>Distinguish measurement bias (Chapter 4) and sampling bias (Chapter 5) from bias in causal effect estimates; here the concern is bias in the estimated causal effect due to unmeasured confounding.</li><li>Unknown or complex common causes can create bias even when the exact mechanism is not fully specified (Figure 11.4).</li></ul><h6>Correlations That Imply Causation (and practice in policy examples)</h6><ul><li>Hospital volume and outcomes: Higher procedure volume is associated with better outcomes; policy question: does volume cause better outcomes, or is there reverse causation or a common cause (e.g., insurance coverage, patient mix)?</li><li>Library cards and reading: Ownership of a library card correlates with reading/academic performance; reverse causation (readers obtain cards) or common causes (parents who value reading) are plausible.</li><li>The chapter emphasizes replication across contexts as a clue toward causation but notes that replication can also occur with biased studies.</li></ul><h6>Causal Mechanisms and Mediators</h6><ul><li>Causal mechanism: the process that transmits the causal effect from IV to DV (the pathway).</li><li>Intervening variables (mediators): variables along the causal path that transmit the effect (IV$ \rightarrow $mediator$ \rightarrow $DV), explaining the relationship between IV and DV. Examples include communication patterns affecting family dynamics.</li><li>Example mechanism for family dinners: improved parent–teen communication may help teens cope with social pressures, explaining why family dinners could reduce drug use.</li><li>Path diagrams: demonstrating the mediator as an intervening variable between the IV and DV (Figure 11.6).</li><li>Direct vs indirect effects:<ul><li>Direct effect: the immediate causal effect of IV on DV without passing through mediators.</li><li>Indirect effect: the portion of the IV's effect that operates through mediating variables.</li><li>A classic decomposition: If x$ \rightarrow $m has coefficient a and m$ \rightarrow $y has coefficient b, and x$ \rightarrow $y also has a direct path with coefficient c', then:</li><li>Indirect effect IE = a$ \times $b</li><li>Direct effect DE = c'</li><li>Total effect TE = DE + IE = c' + a b</li></ul></li></ul><ul><li>The example of gas prices as a direct effect on driving versus potential mediators (awareness of price, price sensitivity) illustrates when an intervening variable may be unnecessary to specify in some analyses.</li></ul><h6>Chance, Significance, and Replication</h6><ul><li>Correlations can occur by chance; statistical significance tests assess whether observed differences are unlikely under a null hypothesis of no effect.</li><li>Important caution: Statistical significance does not prove causation. A correlation can be noncausal even with significant stats due to reverse causation or confounding.</li><li>Replication across different contexts strengthens causal inference but can still be biased if all studies share the same unmeasured confounder.</li><li>Arrows and complexity: Real-world correlations may entail multiple causal pathways operating simultaneously (Figure 11.7).</li></ul><h6>Evidence for Causation: Clues and Criteria</h6><ul><li>Initial clues to causation (not proof):<ul><li>The cause must precede the effect (temporal precedence).</li><li>The correlation appears across different contexts and studies (replication).</li><li>A plausible causal mechanism links IV to DV.</li><li>The magnitude of the effect is substantial and robust, accounting for confounding factors.</li><li>There are no plausible alternative explanations that are equally or more plausible.</li></ul></li></ul><ul><li>Box 11.2 (epidemiology): replication, time precedence, plausibility, magnitude, and control variables are typical clues; some experts caution that replication can occur with biased studies as well.</li><li>Box 11.3: Causation/Causality—terminology and philosophical issues; counterfactuals are central to defining causation.</li><li>Box 11.4: Common causes can bias the apparent effect even when a relationship seems straightforward; controlling for common causes is essential.</li><li>Box 11.5: Exogeneity vs. endogeneity can depend on the dependent variable; the same IV could be exogenous for one outcome but endogenous for another.</li></ul><h6>Counterfactuals and Potential Outcomes</h6><ul><li>Causal inference: A process to determine whether an observed association reflects a causal relationship, often using the potential outcomes framework.</li><li>Potential outcomes framework: For each unit i, there are two possible outcomes depending on treatment status:$ Y{1i} $if treated and$ Y{0i} $if not treated.</li><li>Treatment indicator:$ D_i \in {0,1} $where 1 = treated, 0 = untreated (control).</li><li>Observed outcome: You observe only one of the two potential outcomes for each individual, depending on their treatment status:$ Yi = Di Y{1i} + (1 - Di) Y_{0i} $.</li><li>Counterfactual: The outcome that would have occurred under the alternative treatment status (the unobserved potential outcome). If treated ($ Di = 1 $), the counterfactual is$ Y{0i} $(what would have happened without treatment). If untreated ($ Di = 0 $), the counterfactual is$ Y{1i} $(what would have happened with treatment).</li><li>Individual Treatment Effect (ITE):$ ITEi = Y{1i} - Y_{0i} $.</li><li>Average Treatment Effect (ATE):$ ATE = \mathbb{E}[Y{1i} - Y{0i}] = \mathbb{E}[ITE_i] $.</li><li>Fundamental problem of causal inference: For any individual, we can only observe one of the two potential outcomes, not both. This asymmetry makes causal estimation challenging and requires assumptions or randomized designs.</li><li>Time machines are handy metaphors to discuss counterfactuals, but in practice we estimate counterfactuals through experiments or quasi-experiments rather than actual time travel.</li><li>Experimental intervention aims to create exogenous variation in the IV to cleanly identify causal effects.</li></ul><h6>Exogeneity, Endogeneity, Self-Selection, and Cream Skimming</h6><ul><li>Exogeneity: variation in the IV is independent of potential confounders and is manipulated outside the system (outside influence), meaning assignment to treatment is independent of potential outcomes.</li><li>Endogeneity: the IV is correlated with unobserved confounders or with the error term; this biases causal estimates. Also occurs when the independent variable is not truly independent, leading to bias due to correlation with error terms. Sources include omitted variable bias, measurement error, and simultaneity.</li><li>Self-selection: individuals or units select into treatment levels (IVs determined by participants) causing endogenous treatment allocation and confounding.</li><li>Cream skimming: treatment selection that favors units likely to have better outcomes, introducing endogeneity.</li><li>The counterfactual approach and exogeneity aim to simulate randomized experiments in observational data.</li><li>Box 11.5: Exogeneity vs Endogeneity is context-dependent; a given IV can be exogenous for one DV and endogenous for another.</li></ul><h6>Exogeneity, Endogeneity in Structural Equation Modeling (SEM)</h6><ul><li>Box 11.6: In SEM, endogenous variables have arrows into them; exogenous variables have arrows only outward.</li><li>The question: are labeled endogenous/exogenous variables truly causally arranged as the model specifies, or might there be an omitted common cause (E) that makes B and C endogenous as well?</li></ul><h6>Experimental Design: Exogeneity, Control, and Comparative Studies</h6><ul><li>Experimentation involves: measure the DV, intervene (IV is manipulated exogenously), and observe the DV again.</li><li>Exogenous intervention: the treatment is imposed from outside the system and is not self-selected.</li><li>Experimental control: hold other potential causes constant to isolate the treatment effect.</li><li>Comparative experiments: treatment group vs. control group to estimate the counterfactual (what would have happened without the treatment).</li><li>Limitations of experiments:<ul><li>Limited generalizability: artificial settings may not reflect real-world contexts.</li><li>Ethical concerns and feasibility in policy contexts.</li><li>Sometimes impractical to randomize in real-world policy settings (hence quasi-experiments and natural experiments).</li></ul></li></ul><ul><li>Despite limitations, experimentation remains a gold standard for establishing causation, especially when studying exogeneity and control.</li></ul><h6>Ethics of Experimentation, Policy, and Practice</h6><ul><li>Ethical concerns arise with experimentation on humans and animals (consent, harm, risk-benefit balance).</li><li>Formal ethical principles (informed consent, voluntary participation, no harm, benefits outweigh risks) regulate human-subject research and many forms of data collection.</li><li>Because of ethical constraints, researchers often rely on observational data and quasi-experimental designs to infer causality while adhering to ethical guidelines.</li></ul><h6>Conditions for Identifying Causal Effects (Internal Validity)</h6><ul><li>Internal Validity: A measure of how well evidence supports the claim about cause and effect. It refers to the trustworthiness of study conclusions regarding the relationship between independent and dependent variables. Causality requires meeting key conditions to attribute observed differences to the treatment, not to confounding factors:<ul><li>Temporal precedence (Cause must precede the effect): The cause must occur before the observed effect in time.</li><li>Variation in the cause must correlate with variation in the effect: More exposure or stronger exposure to the cause should be associated with larger changes in the outcome.</li><li>No alternative explanations (confounding must be ruled out): There should be no third variable that causes both the treatment and the outcome.</li><li>Exogenous assignment (randomization) or controlled study design: Assignment to treatment should be independent of potential outcomes (or methods should account for non-random assignment).</li></ul></li><li>The goal of policy evaluation and program evaluation is to use quasi-experimental and experimental designs to satisfy these criteria and attribute observed changes in outcomes to the treatment.</li><li>The instructor emphasizes that the three core conditions (temporal order, covariation, and ruling out alternative explanations) are central to establishing causality and the internal validity of a study.</li></ul>Threats to Internal Validity:<ul><li>Ambiguous Temporal Precedence: Unclear temporal ordering of events.</li><li>Selection Bias: Non-representative samples.</li><li>History Effects: External events impacting study outcomes.</li><li>Maturation: Participant changes over time affecting results.</li><li>Attrition: Dropout of participants can skew results if remaining participants are systematically different.</li><li>Testing and Instrumentation Bias: Changes in testing or measurement tools may alter data.</li></ul><h6>Key Takeaways for Causal Inference in Policy Research</h6><ul><li>A sound causal claim requires more than a strong correlation; it requires addressing alternative explanations via:<ul><li>Temporal precedence (time order)</li><li>Plausible mechanism</li><li>Magnitude and consistency across contexts</li><li>Control for common causes (observed confounders) and unknown confounders (via research design or statistical methods)</li><li>Experimental or exogenously induced variation when feasible</li></ul></li></ul><ul><li>Causation is often probabilistic rather than deterministic; effects may be partial and mediated by other variables.</li><li>Practically, researchers should consider multiple explanations (reverse causation, confounding, mediators, and bidirectional effects) and use a combination of methodological tools to triangulate the causal story.</li><li>Final caution: replication helps but is not proof if a bias is shared across studies; robust causal inference requires cautious interpretation, rigorous design, and, when possible, experimental testing.</li></ul><h6>Box 11.3–11.8: Quick Reference and Guidelines</h6><ul><li>Box 11.3: Causation/Causality terminology; focus on counterfactuals to define causation.</li><li>Box 11.4: Common causes and bias; unknown/complex confounds can bias causal estimates.</li><li>Box 11.5: Exogenous vs Endogenous depends on the DV; same IV can be exogenous for one outcome, endogenous for another.</li><li>Box 11.6: SEM definitions of endogenous and exogenous variables; consider omitted common causes.</li><li>Box 11.7: Critical questions for evaluating causation (questions to ask about causal claims).</li><li>Box 11.8: Practical tips for doing causal research (design, control, exogeneity, bias awareness).</li></ul><h6>Critical Questions for Causation (BOX 11.7)</h6><ul><li>What is the implied causal relationship? Which variable is the IV (cause) and which is the DV (effect)?</li><li>Could reverse causation be plausible? How likely is it?</li><li>What common causes might drive both IV and DV? How plausible are they?</li><li>Is the IV self-selected? If so, how might this bias results?</li><li>Did the study acknowledge reverse causation or common causes? Were alternatives considered?</li><li>Was there an intervention or experiment, or was the study purely observational?</li></ul><h6>Practical Guidance for Conducting Causal Research (BOX 11.8)</h6><ul><li>Identify the cause (IV) and the effect (DV) you wish to study.</li><li>Check initial clues for causation before assuming a causal link.</li><li>Do not jump from correlation to causation; estimate magnitude and uncertainty where possible.</li><li>Consider potential causal mechanisms and mediators.</li><li>Assess whether there are plausible alternative explanations (reverse causation, confounding, selection bias).</li><li>If possible, design experiments to create exogenous variation in the IV; assess generalizability and ethics.</li><li>When experiments are not feasible, use control variables, matching, and quasi-experimental designs to mitigate endogeneity and confounding.</li></ul><h6>Key Terms (Glossary Essential for Exam)</h6><ul><li>Bias (in a causal estimate)$ \cdot $Causal mechanism (mechanism)$ \cdot $Causation$ \cdot $Common cause$ \cdot $Confounding$ \cdot $Counterfactual$ \cdot $Direct effect$ \cdot $Endogeneity (endogenous)$ \cdot $Exogeneity (exogenous)$ \cdot $Experimental control$ \cdot $Experimentation$ \cdot $Indirect effect$ \cdot $Intervention$ \cdot $Negative/Positive correlation$ \cdot $Reverse causation$ \cdot $Self-selection$ \cdot $Simultaneity bias$ \cdot $Spurious relationship$ \cdot $Treatment</li></ul><h6>Equations and Notation (Illustrative)</h6><ul><li>Causal paths and effects (path analysis):<ul><li>If IV = x, mediator = m, DV = y, with path coefficients a (x$ \rightarrow $m), b (m$ \rightarrow $y), and c' (x$ \rightarrow $y direct), then</li><li>Indirect effect:$ IE = a \times b $</li><li>Direct effect:$ DE = c' $</li><li>Total effect:$ TE = DE + IE = c' + a b $</li></ul></li></ul><ul><li>Causal effect via counterfactuals:<ul><li>Let$ Y(x) $denote the outcome if the IV is set to x. The causal effect is often expressed as the average treatment effect (ATE):$ ATE = E[Y(1) - Y(0)] $.</li></ul></li></ul><ul><li>Example risk ratio from Box 11.1 (interpretation):<ul><li>For marijuana/tobacco:$ RR_{\text{marijuana/tobacco}} = 2.5 $(infrequent dinners vs frequent dinners)</li><li>For alcohol:$ RR_{\text{alcohol}} = 1.5 $(infrequent dinners vs frequent dinners)</li></ul></li></ul><ul><li>Magnitude example: 27%$ \rightarrow $11% (27% baseline; 16 percentage-point drop): difference =$ -16 \text{ percentage points}
How to Use These Notes on the Exam
- Distinguish correlation from causation in every question.
- Identify possible alternative explanations: reverse causation, common causes, or unknown confounders.
- Determine whether a study design achieves exogeneity (randomization, intervention) or if it relies on observational evidence with control variables.
- Be ready to discuss potential mechanisms and mediators that explain how a proposed causal effect would work.
- If given a scenario, sketch a simple path diagram showing possible causal pathways (direct, mediated, and confounded) and indicate where bias might arise.
- When asked to critique a study, consider time order, replication across contexts, plausibility of mechanisms, and the adequacy of controls for confounding.
Theories, Models, and Research Questions
- Real-world example: Broken windows theory applied to New York City crime and subway crime reduction (Kelling & Wilson, 1982; Bratton). The idea: addressing small disorders (vandalism, graffiti, public drinking, loitering) can prevent more serious crime. In NYC, quality-of-life policing targeted petty disorder; crime fell in the 1990s. Debates exist: other factors (end of crack epidemic, economy) also plausible explanations. The central aim of the chapter: define theory and models, discuss variables, relationships, and causal mechanisms, and show how path diagrams and logic models help plan, manage, and evaluate programs.
What Is a Theory?
- Theory in social science often means a logical idea about how part of the world works. Centered on middle-range theory (Merton, 1967).
- King, Keohane, and Verba (1994): A social science theory is a reasoned, precise speculation answering a research question, including why the proposed answer is correct (p. 19).
- Theories prompt questions and guide the search for plausible explanations; theories can describe large-scale (e.g., war) or small-scale (e.g., reading ability) phenomena. Theories are practical because they illuminate how to change the world.
- Theories are not automatically true; they must withstand questioning and empirical testing.
The Key Functions of Theories
- Identify key variables: The broken windows theory highlights disorder as a key variable, potentially affecting crime.
- Tell causal stories: The theory posits a causal mechanism where disorder signals lack of social control, emboldening criminals.
- Recognize that a theory is one of many possible causes; outcomes almost always have multiple causes (economic conditions, demography, weather, etc.).
- Theories produce probabilistic predictions: they describe what is likely to happen on average, not guaranteed outcomes in every case.
- Theories explain variation: they account for longitudinal variation (over time) and cross-sectional variation (across places or groups). Visuals referenced: Fig. 2.1 (murder rate over time, longitudinal variation) and Fig. 2.2 (murder rates across large U.S. cities, cross-sectional variation).
Theories Generate Testable Hypotheses
- Good theories yield observable implications; they should be falsifiable (Popper, 1959).
- Examples from broken windows: more crime in neighborhoods with vandalism and graffiti; less crime where such disorder is reduced.
- A theory can be tested against data and compared to alternative theories.
Theories Focus on Modifiable Variables
- Some causes are nonmodifiable (economy downturns, weather, population age structure), but theory often targets modifiable variables because they offer policy/practice leverage (e.g., increasing patrols against disorder).
- Other theories may study nonmodifiable factors to understand broader influences on crime.
Where Do Theories Come From?
- Grand theories (paradigms) vs. middle-range theories. Grand theories include structural functionalism, symbolic interactionism, rational choice, Marxist materialism, Freudian psychoanalysis, critical theory, feminism, postmodernism, etc. They shape how researchers frame variables and mechanisms.
- Rational choice: crime explained as opportunistic self-interest where rewards outweigh costs.
- Some grand theories (critical theory, postmodernism) are antipositivist and question causation and empirical testing in human behavior, leading to different forms of theory.
- Theories come from
 - Academic disciplines (economics, sociology, psychology).
 - Induction (building theory from empirical observations) and deduction (starting from principles and testing them).
 - Exploratory studies, lived experience, and practitioner observations (police chiefs noting patterns).
 - Qualitative research and exploratory work can generate theory and hypotheses.
Induction and Deduction; Testing Theories
- Induction: theory emerges from empirical observations; testing with independent data is essential to falsify.
- Deduction: theory starts from principles; testing requires new data not used to generate the theory.
- The distinction matters because using the same data to both generate and test a theory yields a non-falsifiable test.
- Question prompt: sidewalk litter – induction vs. deduction? Identify the test needed to evaluate a proposed relation (e.g., income and crime). (From Box discussions in the chapter).
Exploratory and Qualitative Research
- Qualitative research provides insight into processes and influences in social settings (e.g., high-crime neighborhoods).
- Qualitative insights can suggest variables (e.g., social control) and potential policy targets.
Theories, Norms, and Values
- Theories are descriptive (positive) rather than normative; they describe what produces crime, not what ought to happen.
- Theories are not value-free: different theories emphasize different causes and thus imply different policy options (e.g., deterrence vs. social investment).
- Underlying assumptions influence theories; these assumptions may come from discipline, culture, or politics. It’s important to make assumptions explicit when interpreting or creating theories.
What Is a Model?
- A model communicates a theory; it is a representation of the causal process.
- Types: graphical (path diagrams) or mathematical (equations).
- Path diagrams will be used extensively in this book (e.g., Fig. 2.3 Broken Windows path diagram).
Variables and Relationships
- A model consists of variables (ovals) and relationships (arrows).
- The plus sign (+) on an arrow indicates the direction of the relationship.
- A variable is something that can take different values (it must vary) and can be independent (X) or dependent (Y) in the causal order.
- Independent vs. Dependent Variables:
 - Independent variable (X): the cause, symbolized as X.
 - Dependent variable (Y): the effect, symbolized as Y.
 - In simple models: Class size (X) \rightarrow $Test scores (Y).</li></ul></li></ul><ul><li>Practice exercise prompts include identifying independent/dependent variables in example statements (e.g., speed limit changes; climate change vs. civil unrest; flexible schedules vs. productivity).</li><li>Box 2.1: Independent and Dependent Variables – naming and direction; even if a variable is named without direction, the arrow should convey direction (+ or$ \minus $). Dummy (indicator) variables are two-category categorical variables that can be treated in path diagrams when possible.</li></ul><h6>Causal Mechanisms</h6><ul><li>A theory includes a causal mechanism that explains how a change in X leads to a change in Y (the process in between the cause and effect).</li><li>In the broken windows tale, the mechanism might be: disorder signals social disregard$ \rightarrow $criminals sense weak enforcement$ \rightarrow $crime increases. However, a full description requires accompanying narrative mechanisms beyond the diagram.</li><li>Intervening variables (mediators) represent steps along the causal path; they can be measured to test the mechanism.</li><li>Box 2.2: Equations as Models – Y = a + bX is a common regression form; dependent variable Y on the left, independent X on the right in an equation; right-hand-side variables vs. left-hand-side variables terminology. Direction of the relationship is captured by the sign of b and by the arrow in the diagram.</li><li>Directional signs: + (positive) means X and Y move together;$ \minus $(negative) means inverse relation.</li><li>For some categorical variables, direction is not meaningful; in those cases, use a dummy variable with a non-directional naming approach. Chapter 4 covers measurement levels and dummy variables in more detail.</li></ul><h6>Naming Variables; Non-Directed Labels</h6><ul><li>Avoid naming variables with obvious direction (e.g., Income$ \rightarrow $Crime). Use nondirectional names in diagrams and indicate direction with +/$ -\text{ signs} $on arrows.</li><li>This clarifies the theory and helps avoid misinterpretation.</li></ul><h6>Models With Multiple Causes</h6><ul><li>Real-world outcomes often have multiple causes; statistical models (e.g., multiple regression) use several independent variables.</li><li>Reasons to include multiple causes: 1) To separate the unique effect of the variable of interest from correlated alternatives (control variables). 2) The theory itself may posit multiple causal pathways (economic, demographic, etc.).</li></ul><h6>Causal vs Noncausal Relationships</h6><ul><li>A causal relationship is depicted by a straight arrow in path diagrams.</li><li>Noncausal relationships (correlations, spurious associations) should be shown with a curved dotted line (or alternative representations) without implying causation.</li><li>Figure 2.5 demonstrates a noncausal relationship in a dotted line without an arrowhead (or with alternate convention depending on the author).</li></ul><h6>Unit of Analysis; Aggregation and Ecological Fallacy</h6><ul><li>Unit of analysis: the social entity described by the variables (individuals, households, neighborhoods, cities, blocks, time units).</li><li>Data examples: Tables showing data with individuals as units, or cities as units, etc. The choice of unit affects interpretation and analysis.</li><li>Aggregation problem: relationships observed at one unit of analysis may not hold at more aggregated levels (or may reverse direction). This is the ecological fallacy.</li><li>Robinson (1950) classic example: literacy vs. immigrant share shows how relationships can reverse when moving from state-level (aggregate) to individual-level analysis.</li><li>Aggregated data can still provide insights about individual behavior under certain conditions; there is a substantial technical literature on when aggregation is appropriate.</li></ul><h6>Logic Models</h6><ul><li>Logic models describe how programs or interventions produce outcomes; also called program theories, outcome-sequence charts, or theories of change.</li><li>They are diagrams/flow charts that show links between contextual factors, inputs, processes, and outcomes. The goal is to articulate why a program should work and how it leads to outcomes.</li><li>Example: Smaller class sizes in Jackson, MS (JPS) – initial logic model shows class size directly affecting test scores (Figure 2.6). A more complete model includes intervening variables that explain the causal mechanism.</li></ul><h6>Illustrating Intervening Variables (Mediators)</h6><ul><li>Figure 2.7 shows a more detailed logic model: smaller class size$ \rightarrow $reduced discipline problems$ \rightarrow $more instructional time$ \rightarrow $improved exit-test scores.</li><li>Intervening variables (mediators) help explain how the program works and identify potential bottlenecks or failures in the causal chain.</li><li>The product of pathways can be checked by multiplication of signs along the chain (e.g., class size ($ \minus $) discipline (+) instruction time (+) test scores gives ($ \minus $) overall effect on test scores). The multiplication rule helps verify internal consistency.</li><li>If a pathway suggests a chain where effect signs don’t align, re-examine the model for incorrect links or missing mediators.</li><li>Box: Speed bumps example – an intervening variable (mediator) could be pedestrian exposure time reduced, or improved crosswalk behavior, etc.</li></ul><h6>Other Causes and the Role of Control Variables</h6><ul><li>Test scores can be influenced by many factors (SES, curricula, teacher quality, broader economic conditions).</li><li>Logic models can incorporate these via control variables to better estimate the effect of the primary intervention (class size reduction), and to understand how other factors influence outcomes.</li><li>Figure 2.8 adds additional causal factors beyond class size to illustrate how the model can accommodate multiple influences.</li></ul><h6>Usefulness of Logic Models</h6><ul><li>Identify important variables to track; inform performance indicators; guide evaluation design.</li><li>Expose weak links in program theory; help plan implementation and measurement.</li><li>Example: Westside hospital marketing campaign for heart surgery patients – expands to include intervening variables like perceived reputation, patient preference, and patient volume; shows why and how a program would influence outcomes beyond simplistic inputs-outputs logic.</li></ul><h6>Box 2.4 China Launches Nationwide AIDS Prevention Program</h6><ul><li>Box provides an example of a real-world logic-model-oriented planning approach with a multi-year AIDS prevention program (RCSC, 2009).</li><li>Several tips follow (Box 2.4): see below for concise bullet points extracted from the text.</li></ul><h6>Tips for Creating a Logic Model (Box 2.4)</h6><ul><li>Tip 1: Start with a single dependent (Y) variable/outcome. If multiple outcomes exist, focus on one at a time.</li><li>Tip 2: Add a single independent (X) variable representing the program.</li><li>Tip 3: Put the program (X) on the left and the outcome (Y) on the right; leave space for intervening variables.</li><li>Tip 4: Add intervening variables (mediators) to reflect distinct pathways versus steps along the same pathway.</li><li>Tip 5: If intervening variables are truly causal steps along one pathway, they should be depicted as steps in the same causal chain rather than separate branches.</li><li>Tip 6: Assess each link in isolation; if a link is unclear, add mediators to explain the connection.</li><li>Tip 7: Do not over-name variables directionally; use non-directional labels and add +/$ -\text{ signs} on the arrows to reflect directionality; some categorical variables may not have a direction.
 - Tip 8: Recognize levels of detail; simpler models for big-picture proposals vs. more detailed models for implementation and evaluation; ideally, each link has empirical backing or clearly labeled gaps.
 Inputs, Activities, Outputs, and Outcomes
 - Logic models can include program implementation aspects: inputs (resources), activities (actions), outputs (immediate products).
 - Outcomes can be short-term, intermediate, and long-term.
 - Example: CDC logic model for hypertension management focusing on chronic care management (CCM) shows inputs, activities, outputs, and cascading outcomes from better treatment to fewer heart disease/strokes.
 - The caution: avoid over-emphasizing implementation details at the expense of clearly articulating the causal mechanism to desired outcomes.
 Additional Issues in Theory Building
 - Interpretivist theory: contrasts with the quantitative, middle-range view; focuses on interpreting meanings, norms, and symbols; aims for understanding rather than causal prediction.
 - Does Theory Shape Observation? Observations can be theory-laden; survey question design can influence responses and shape observed relations.
 - Theories of the Independent Variable: Sometimes a theory may posit that both X and Y are effects of a common cause (e.g., socioeconomic disadvantage). Attacking X alone may be less effective; consider underlying common causes.
 - Moderators (interactions): A moderator changes the strength/direction of a relationship (e.g., teacher experience moderating the effect of instruction time on test scores). In path diagrams, moderators are shown as arrows toward the relationship they influence (Figure 2.11).
 - Hierarchical (multilevel) models and contextual variables: Some relationships operate across different units (students within classrooms within schools). Higher-level contextual variables can affect relationships at lower levels.
 - Theoretical research vs. empirical research: Theoretical work synthesizes existing theories to predict new situations; examples include Rosen (1981) on technology changes and earnings concentration in opera singers; Christensen & Remler (2009) on electronic health records adoption and system lock-in.
 How to Find and Focus Research Questions
 - A research question is the motivation for a study; real-world research often involves messy, iterative processes: starting with a broad question, refining it through data access, feasibility, and theoretical framing.
 - Applied research questions often arise from policymakers and practitioners (e.g., do smaller classes improve learning? does telework increase productivity? does lowering speed limit reduce fatalities?).
 - The process includes defining the intervention (X) and expected outcomes (Y), considering intervening variables, and exploring unintended consequences.
 - Example questions: Does the JPS class-size reduction improve third-grade exit-test scores? Through what mechanisms (instruction time, individual attention) does it operate? Are there unintended consequences (reduced resources for libraries, arts, or after-school programs)?
 - Chapter provides practical guidance on forming research questions; the process involves/benefits from using model-building tools, developing a path model, and evaluating feasibility given data constraints.
 Descriptive vs Causal Questions
 - Descriptive (what is) vs. causal (what if) questions require different methods and data.
 - Researchers should clearly define whether their primary aim is description or causal inference.
 Positive vs Normative Framing of Questions
 - Theories are positive (describing how the world is) rather than normative (how it should be).
 - Framing a question positively (without normative assumptions) improves testability and empirical focus.
 - Example: Instead of asking, Why aren’t more young people interested in politics and voting? (normative), ask: How interested are young people in politics, and how often do they vote? Does interest predict voting behavior?
 Generating Questions and Ideas
 - How to generate research questions:
 - Review scholarly literature for anomalies or unanswered questions.
 - Explore policy/practice concerns and current events to surface relevant problems.
 - Read widely across disciplines to gain new perspectives.
 - Maintain a notebook of ideas.
 - Heuristics for question generation (Andrew Abbott, 2004): analogies (e.g., voting as an economic transaction), reversals (asking why people vote when individual impact is small), cross-disciplinary borrowing to develop new insights.
 Conclusion: Theories Are Practical
 - The broken windows example shows theory guiding policy decisions and practical action.
 - Theories help explain causes of social problems and guide interventions; logic models help show how programs work and how to improve them.
 - Path diagrams are essential tools for representing causal reasoning and guiding analysis.
 Boxed Highlights, Key Terms, and Exercises
 - Key terms to know include: Causal mechanism, Causal and noncausal relationships, Cross-sectional variation, Ecological fallacy, Grand theory, Hierarchical models, Hypothesis, Independent and dependent variables, Logic model, Moderation, Intervening variable, Path diagram, Unit of analysis, etc.
 - Boxes cover practical points: Box 2.1 on independent/dependent variables; Box 2.2 on equations as models; Box 2.3 on what a logic model is; Box 2.4 on a real-world AIDS program example; Box 2.5 on critical questions to ask about theory; Box 2.6 on doing your own research with heuristics; Box 2.7 (not explicitly numbered here) on interpretivism and observation; Box 2.8 on other examples explaining logic models (inputs/activities/outputs).
 - Figures mentioned include:
 - Figure 2.3: Path diagram of Broken Windows Theory (disorder \rightarrow $crime).</li><li>Figure 2.4: Positive and negative relationships.</li><li>Figure 2.5: Noncausal (or spurious) relationship shown with a dotted line.</li><li>Figure 2.6: Simple path model for class size and test scores.</li><li>Figure 2.7: Extended logic model with intervening variables.</li><li>Figure 2.8: Logic model showing other causes and intervening variables.</li><li>Figure 2.9: CDC logic model with inputs, activities, outputs, and outcomes.</li><li>Figure 2.10: Common cause of independent and dependent variables.</li><li>Figure 2.11: Moderator depicted in a path diagram.</li></ul></li></ul><ul><li>Exercises (2.1–2.6) challenge you to create theories, identify variables, define relationships, and draft logic models.</li><li>The chapter ends with practical guidance and a “Box 2.5 Critical Questions” and a “Box 2.6 Tips on Doing Your Own Research” to structure your own studies.</li></ul><h6>Summary of Core Concepts (Glossary-Style)</h6><ul><li>Theory: a reasoned, testable explanation of how some aspect of the world works; can be middle-range or grand; generates hypotheses.</li><li>Model: a representation (graphical or mathematical) of a theory, comprising variables and causal relationships.</li><li>Variable: something that can take different values; must vary to be studied.</li><li>Independent variable (X): presumed cause; often on the left in diagrams.</li><li>Dependent variable (Y): presumed effect; outcome of interest.</li><li>Causal mechanism: the process that links X to Y; often involves intervening variables (mediators).</li><li>Intervening variable (mediator): a variable along the causal path that transmits the effect of X to Y.</li><li>Moderators (interactions): variables that affect the strength/direction of a relationship between X and Y.</li><li>Dummy variable: a two-category categorical variable used in analysis.</li><li>Path diagram (path model): a graphical representation of a theory with arrows showing causal directions.</li><li>Noncausal relationship: a correlation not due to a causal link; depicted with a curved line or dotted line without a causal arrowhead.</li><li>Unit of analysis: the entity described by the data (individuals, neighborhoods, cities, etc.).</li><li>Aggregation problem: relationships at one unit may not hold at another unit when data are aggregated; ecological fallacy occurs when inferences about individuals are drawn from aggregate data.</li><li>Logic model: a program theory expressed as a diagram showing how context, inputs, activities, outputs, and outcomes are linked.</li><li>Longitudinal variation: variation over time.</li><li>Cross-sectional variation: variation across units at a single point in time.</li><li>Modifiable variable: a variable that can be influenced by policy/practice (e.g., disorder, patrol practices).</li><li>Nonmodifiable variable: a variable not easily altered by policy (e.g., weather, macroeconomic trends).</li></ul><h6>Concrete Formulas and Notation (LaTeX)</h6><ul><li>Linear relationship model (example): $ Y = a + bX
 where Y is the dependent variable, X is the independent variable, a is the intercept, and b is the slope.
 - Direction of relationships via sign on arrows:
 - Positive: $+$ sign \rightarrow $both variables increase together.</li><li>Negative:$ \minus $sign$ \rightarrow $one increases as the other decreases.</li></ul></li></ul><ul><li>Path consistency check along a causal chain:<ul><li>Example: class size (X)$ \rightarrow $discipline (M)$ \rightarrow $instruction time (N)$ \rightarrow $test scores (Y) with relationships: X$ \rightarrow $M ($ \minus $), M$ \rightarrow $N (+), N$ \rightarrow $Y (+). The overall effect of X on Y becomes$ (\minus) \times (+) \times (+) = (\minus) $.</li></ul></li></ul><ul><li>Noncausal path representation:<ul><li>A curved dotted line with no arrowheads (or an alternative convention) to indicate correlation not due to causation.</li></ul></li></ul><h6>Practical Takeaways for Studying and Exam Prep</h6><ul><li>Be able to distinguish theory, model, and logic model in applied contexts.</li><li>Practice converting qualitative or narrative explanations into path diagrams with clearly labeled variables (X, M, Y) and interpret the intervening variables.</li><li>Recognize when to use multiple regression or multilevel models to analytically separate effects and account for contextual factors.</li><li>Analyze the direction of relationships and the nature of the variables (continuous vs. categorical/dummy).</li><li>Understand the role of unit of analysis and the ecological fallacy when interpreting results across different aggregation levels.</li><li>Use logic models not only for theory testing but also for program planning, evaluation design, and identifying gaps in the causal chain.</li><li>Distinguish descriptive vs. causal research questions and ensure framing is positive and testable.</li><li>Be prepared to discuss the interpretivist vs. positivist (middle-range) perspectives and how they influence theory construction and observation.</li><li>Review the “critical questions” and “tips” boxes to sharpen your thinking about how to structure research questions and models.</li></ul><h6>Quick Reference: Notable Figures/Boxes to Recall</h6><ul><li>Fig. 2.3: Path diagram for Broken Windows Theory (Disorder$ \rightarrow $Crime).</li><li>Fig. 2.4: Positive and Negative Relationships (illustrates directional signs).</li><li>Fig. 2.5: Noncausal (or Spurious) Relationship (dotted line).</li><li>Fig. 2.6: Simple Path Model for Class Size$ \rightarrow $Test Scores.</li><li>Fig. 2.7: Logic Model with Intervening Variables (Class Size$ \rightarrow $Discipline$ \rightarrow $Instruction Time$ \rightarrow $Test Scores).</li><li>Fig. 2.8: Logic Model showing Additional Causes/Intervening Variables.</li><li>Fig. 2.9: CDC logic model example (Inputs$ \rightarrow $Activities$ \rightarrow $Outputs$ \rightarrow Outcomes).
 - Fig. 2.10: Common Cause illustration (independent variable and dependent variable both influenced by a common cause).
 - Fig. 2.11: Moderator in a Path Diagram (moderator effect on a relationship).
 - Box 2.1: Independent and Dependent Variables – naming and interpreting directions.
 - Box 2.2: Equations as Models – right-hand side vs. left-hand side variables; Y = a + bX.
 - Box 2.3: What Is a Logic Model? – definition and purpose.
 - Box 2.4: China AIDS Prevention Program – real-world logic-model example with practical tips.
 - Box 2.5: Critical Questions to Ask About Theory, Models, and Research Questions.
 - Box 2.6: Tips on Doing Your Own Research – practical guidance for developing questions and models.
 Chapter Resources and Exercises (Overview)
 - EXERCISES 2.1–2.6 prompt you to create theories, identify variables, determine directions, specify units of analysis, and draft a logic model for a real-world program.
 - STUDENT STUDY SITE: online resources including self-quiz, eFlashcards, and related materials.
 - Key terms and glossary appear throughout, with opportunities to test understanding using the included questions.
 Purpose and Value of Evidence
 - We want to make a difference in the world (education, health, crime reduction, arts, innovation, housing, leadership) and need evidence beyond personal experience to know what works and to persuade decision-makers with authority and resources.
 - Good evidence comes from well-made research. Evidence can take many forms: journal studies, internal analyses, government or foundation reports, performance briefs, program evaluations, needs assessments, or surveys of clients or employees.
 - Government and international data sources provide empirical evidence across topics like health services, education, labor markets, crime, housing, and environment. Examples of data sources: data.gov (U.S.), data.un.org (UN). Similar data portals exist in many countries.
 - The Internet era creates an abundance of studies and statistics, but we must know how to choose, interpret, and apply them. Good research is well designed and well made; trust in brand names is limited because each study is unique with strengths and weaknesses.
 - Research methods are the techniques and procedures that produce evidence (sampling, measurement instruments, planned comparisons, statistical techniques). Understanding methods helps us judge study quality and the strength of its evidence.
 May the Best Methods Win
 - Understanding research methods helps us argue about evidence that supports or undermines our aims.
 - Controversy example: abstinence-only sex education vs comprehensive sex education.
 Abstinence-only advocates argue against condom distribution; proponents of comprehensive education argue it better addresses real-life behaviors and reduces pregnancy and STIs.
 A review by Douglas Kirby (2007) identified 115 studies on pregnancy prevention programs for U.S. teens (abstinence and comprehensive).
 The takeaway: in public policy, neither side wins merely by citing a single study; the battle centers on how well the studies are designed and conducted (methods matter).
 Research-Savvy People Rule
 - Research methods are essential whether you are a researcher, analyst, practitioner, or administrator.
 - Reasons to know methods:
 Good research provides a factual basis for decisions and strengthens arguments.
 In the information age, the ability to find, understand, and apply complex information is highly valuable to organizations.
 Funding and policy-making demand evidence-based programs and management reforms; to win support, you must demonstrate methodological understanding.
 Without method literacy, you are at a disadvantage in securing jobs, advancing, and obtaining financial and political support.
 Research, Policy, and Practice
 - Research has become central to modern public policy and management, reflected in performance measurement, program evaluation, and evidence-based practices.
 - Performance Measurement
 The idea: measure performance to manage and improve. Examples include data-driven crime tracking (e.g., CompStat in New York City).
 Emphasis on measuring performance across education, health care, and other sectors.
 The information revolution supplies data; logic models are used to decide what to measure; valid and reliable measurements are discussed in later chapters.
 - Evaluation Research
 Core questions: Did a program have an impact? Did it improve outcomes? Also describes implementation processes.
 Evaluation is a standard requirement for government and foundation funding; Rossi, Lipsey, & Freeman (2003) and C. Weiss (1997) are key references.
 - Evidence-Based Policy and Programs
 Governments, businesses, and nonprofits increasingly favor evidence-based approaches.
 Decision-makers compare programs for effectiveness and cost-effectiveness (magnitude of effect relative to cost).
 Chapters teach how to identify, assess, and produce good evidence to support aims.
 - Evidence Can Mislead
 Not all evidence is perfect; methodological flaws can mislead.
 Misleading measurements: flawed data collection can inflate or distort outcomes (NCLB and the Houston TAAS vs. Stanford TAAS discrepancy).
 Misleading samples: nonrandom samples (e.g., USA Today poll) can overstate or misrepresent population characteristics; proper random sampling (e.g., GSS) provides more accurate estimates.
 Misleading correlations: correlation does not imply causation (fluoridated water example; Hillier et al. 2000 showed that controlling for age, sex, weight, lifestyle can remove the apparent link).
 - What Is Research?
 Research is a social and intellectual activity involving systematic inquiry to describe and explain the world.
 Primary vs Secondary Research:
 Secondary research: searches and syntheses of published sources (not the primary focus of this text).
 Primary research: original collection or analysis of data to answer new questions; includes both primary data collection and the original analysis of secondary data (e.g., existing surveys, administrative records).
 Data terminology:
 Data: refers to raw observations or a data set; not the published facts.
 - It Comes in Various Shapes and Sizes
 Research varies: large-scale vs small-scale, snapshots vs longitudinal, lab experiments vs naturalistic observation, carefully planned interventions vs opportunistic discoveries, theoretical analyses, informal internal analyses.
 Inventiveness and creativity are important; good research often involves new methods or clever strategies.
 - It’s Never Perfect; It’s Uncertain and Contingent; It Aims to Generalize; Bits and Pieces of a Puzzle
 No study is perfect; good consumers spot weaknesses but also identify strengths.
 Uncertainty is inherent; results are often probabilistic and context-dependent.
 Generalizability: the extent to which results apply beyond the original setting; real-world studies are often less generalizable; researchers strive for generalizability but must acknowledge limits.
 Empirical evidence is cumulative; rarely is a single study definitive; scientific consensus emerges within bounds of probability.
 Explain Generalizability
 - Generalizability is the ability to apply research results beyond the exact setting (time, place, circumstances) studied.
 - Example: emergency-visit policies with out-of-pocket payments tested in one insurance plan may not apply to older, less healthy, lower-income populations with different incentives and behaviors; results may be limited to that context.
 - Although generalizability is a goal, real-world research has limitations; still, the evidence can inform policy and practice when interpreted carefully.
 Global Warming and Scientific Consensus
 - Across thousands of studies on global warming, none alone proves human causation, but the body of evidence supports a consensus that warming is occurring and is very likely caused by human activity (UN IPCC 2007; U.S. Global Change Research Program 2009).
 - Establishing consensus requires years of diverse research and debate; it can be possible for consensus to be tempered by new evidence or contested by dissenting researchers.
 - The process involves competition and critique, especially peer review, in which researchers assess each other’s methods and conclusions.
 - Peer review is usually blind to avoid bias; readers should approach research with honest, critical thinking even when peer-reviewed.
 Quantitative, Qualitative, and Mixed Methods; Triangulation
 - Research can be quantitative (numbers, statistics), qualitative (language, images, meanings), or a mix.
 - Mixed methods combine strengths of both approaches; triangulation uses multiple methods to confirm findings.
 - Qualitative research can be rigorous; numbers alone do not determine quality.
 - Chapter emphasizes that qualitative research is foundational (Chapter 3) and good quantitative work relies on solid qualitative groundwork; both perspectives enhance each other.
 Applied vs Basic Research
 - Applied research: conducted to solve practical problems; has direct policy or practice implications (e.g., unemployment, smaller class sizes, policing strategies).
 - Basic research: pursuit of knowledge for its own sake; may be less immediately practical but builds theoretical foundations that inform policy and practice.
 - Both types advance knowledge, though the link from basic research to application is often indirect.
 Descriptive and Causal Research
 - Descriptive research describes the world: what things are, the size of phenomena, and how variables relate (associations or correlations).
 - Causal research seeks to determine what would happen if we change something: estimating the effect of interventions or policy changes.
 - In practice, practitioners need both descriptive understanding and causal evidence for effective policies and programs.
 - Autism example: policymakers and practitioners seek descriptions (how many, where, severity) and causal understanding (what would reduce incidence or severity).
 - Distinguishing description from causation is central to the text; Part II covers description, Part IV covers causation.
 Correlation Is Not Causation
 - It is easy to confuse a correlation with a causal effect.
 - Example: educated mothers and autism incidence is a correlation, not proof that education causes autism; confounding factors may explain the relationship.
 - The fluoridation example shows a spurious correlation that disappears when other factors are controlled (Hillier et al. 2000).
 - Important skill: distinguish correlation from causation and assess evidence of causal effects (Chapter 11 and beyond).
 Epistemology: Ways of Knowing
 - Ways of knowing include direct measurement, trusted authorities, tradition, intuition, and common sense.
 - The book emphasizes the scientific method as a privileged, systematic approach to knowledge production.
 - Readers should question scientific knowledge just as they question common sense or tradition.
 The Scientific Method
 - Key characteristics of the scientific method:
 Systematic observation or measurement (including qualitative observation).
 Logical explanation via theory or model that aligns with logic and established facts.
 Prediction in the form of a hypothesis derived from the theory; falsifiability is preferred over post hoc explanations.
 Openness: methods are documented and available for review to enable replication.
 Skepticism: peer review and critique to identify shortcomings or alternative explanations.
 - The science method is a privileged form of knowing because it is transparent, logical, and evidence-based; however, science can be misrepresented or misused, so critical appraisal remains essential.
 - Understand that interpretations of the method vary across disciplines and over time.
 Is There One Truth in Social Science?
 - The social world differs from natural sciences due to human consciousness, culture, history, and politics; social phenomena vary more across places and times, making knowledge more contingent.
 - Social science is shaped by language and socially constructed categories, influencing what is observed and how it is interpreted.
 - Some reject the relevance of the scientific method to social policy (antipositivism), while others defend a broader, pragmatic version of scientific realism.
 - The authors describe a stance of scientific realism: social reality exists and can be studied with methods modeled on science, despite social constructions.
 Induction and Deduction
 - Researchers use either induction (from systematic observation to theory) or deduction (from theory to hypotheses/tests), or a combination.
 - Induction is common in qualitative research; in quantitative work, patterns may inspire theory.
 - Structuralists argue for starting with theory; most researchers use a mix: theory informs data collection and data leads to new theories.
 - Fresh data are required to truly test a theory or hypothesis; data cannot be used both to develop and definitively confirm a theory in the same way.
 - Research is often iterative, alternating between deduction and induction.
 Approaching Research From Different Angles
 - The book addresses three perspectives:
 Consuming Research: readers as researchers, policymakers, journalists, or students who digest and apply findings.
 Commissioning Research: clients frame questions, approve methods (sampling, measures), review briefs, and decide on changes; the choice of researchers influences quality; open communication with researchers is essential.
 Conducting Research: applied research in government, nonprofits, business, and consulting; researchers may have diverse backgrounds; informal research by practitioners is also valuable.
 - Ethics of Research: research involving human subjects raises ethical concerns that shape study design and methods. Case studies illustrate historical ethical breaches and the development of ethical standards.
 Ethical Issues in Research; History and Principles
 - Historical abuses led to formal ethical principles and procedures for human subjects research:
 Nuremberg Code (1947): informed consent, voluntary participation, no harm, beneficence.
 Declaration of Helsinki (1964): ethical principles for medical research.
 Belmont Report (1979): U.S. framework for ethics regulations (45 C.F.R. Part 46).
 - Core ethical standards (the Belmont standards):
 Respect for persons: informed consent and voluntary participation.
 Beneficence: minimize harm and maximize benefits.
 Justice: fairness in the distribution of research benefits and burdens.
 - Many countries have ethics review processes (IRBs in the U.S.; analogous bodies elsewhere).
 - Informed consent involves ensuring understanding and voluntary participation; challenges include comprehension levels, language/cultural barriers, and power dynamics.
 - Privacy and confidentiality vary by research form and context (administrative data vs in-depth interviews; public data vs restricted data).
 - Informed consent and deception: debates about using deception versus transparency; safeguards include allowing withdrawal, debriefing, and minimizing risk.
 - Ethical issues depend on the form and context of research (qualitative interviews, measurement length, secondary data usage, laboratory experiments, randomized trials, quasi/ natural experiments).
 - The text signals that ethics must be considered throughout the research lifecycle, including policy applications and IRB processes.
 Informed Consent: What It Entails
 - Informed consent requires understanding what participation involves and being competent to consent.
 - Voluntary consent may be complicated by power imbalances or potential consequences (e.g., benefits eligibility).
 - Challenges include language proficiency, reading level of consent forms, and cultural differences in interpreting participation.
 - Researchers must balance providing enough information with ensuring comprehension; different contexts raise different consent considerations.
 Ethical Issues Depend on Form and Context
 - Confidentiality means different things across study types (statistical health data vs in-depth interviews about abuse).
 - Informed consent, confidentiality, deception, and acceptable data use vary by research method and context.
 - The chapters preview where ethical considerations will be discussed in detail (qualitative methods, measurement burden, secondary data, primary data collection, laboratory/causal experiments, randomized trials, quasi/natural experiments, and policy applications).
 Conclusion: The Road Ahead
 - Research methods draw on multiple disciplines (sociology, economics, health sciences, education) and form a complex landscape.
 - Even experienced researchers struggle to communicate across disciplinary dialects; the goal is to cut through terminological confusion and understand core issues.
 - The book aims to equip readers to think critically about theory, models, and research questions and to engage with research as both consumers and producers.
 Chapter Resources and Key Terms
 - Key terms introduced: Applied research, Basic research, Beneficence, Causal research, Contingent, Data, Deduction, Descriptive research, Epistemology, Evaluation research, Generalizability, Induction, Justice, Peer review, Performance measurement, Positivism, Primary data, Primary research, Relationships, Research methods, Respect for persons, Scientific method, Scientific realism, Secondary data, Secondary research, Spurious correlation, Structuralists.
 Exercises (PROMPTS)
 - Battleship Research 1.1: Identify other policy debates where opponents use research; discuss role of research methods.
 - Research in the Corner Office 1.2: Identify leadership roles in your field that will use or commission research; consider interviewing someone.
 - Following the Trends 1.3: Provide examples of a performance measure, a program evaluation, and an evidence-based policy in your area.
 - Misleading Evidence 1.4: Find a news article about a study; assess whether criticisms relate to misleading measurements, samples, or correlations.
 - Descriptive vs Causal Research 1.5: Propose descriptive and causal questions for a social issue.
 - Ways of Knowing 1.6–1.7: Explore sources like Wikipedia and FiveThirtyEight for methodological alignment with the scientific method; evaluate sources and citations.
 - Ethical Research: Informed Consent 1.8–1.9: Find a study involving human subjects; discuss ethical issues; plan an interview-based study ensuring respect for persons, beneficence, and justice; define informed consent contents.
 - Study Site and Further Resources: Access the Sage Study Site for quizzes, eFlashcards, and additional resources.
 Notes on Notable Examples Mentioned
 - Houston TAAS vs Stanford TAAS discrepancy criticized by New York Times (Schemo & Fessenden, 2003) illustrating measurement issues and test validity differences.
 - USA Today quick poll vs General Social Survey (GSS) on gun ownership showing how sampling methods affect results.
 - Hillier et al. (2000) UK study showing correlations between fluoridated water and bone fractures can be explained by confounding variables; demonstrates spurious correlation.
 - Milgram (1960s) obedience studies and Humphreys (1966–67) social observation study are classic ethical debates illustrating risk, deception, consent, and the balance of scientific value and ethical safeguards.
 - Belmont Report (1979) established foundational ethics standards: respect for persons, beneficence, justice, and the role of IRBs.
 References to Frameworks and Theorists Mentioned
 - Rossi, Lipsey, & Freeman (2003) on evaluation research
 - C. Weiss (1997) on evaluation and policy
 - Hatry (2007); Kaplan & Harvard Business School (2009); Poister (2003) on performance measurement and management
 - Davies, Nutley, & Smith (2000) on evidence-based policy and practice
 - Scola (2012) on data-driven political campaigns
 - Bēcker & Becker (1998) on deduction in economics
 - Godfrey-Smith (2003); Bunge (1993) on scientific realism and philosophy of science
 - UK Hillier et al. (2000) on controlling for confounders in observational study
 - UN IPCC (2007); U.S. Global Change Research Program (2009) on global warming consensus
 Key Takeaways
 - The credibility of policy and practice hinges on the quality of methods; better-designed studies tend to produce more reliable conclusions.
 - No study is perfect; critical appraisal requires weighing weaknesses against strengths and the breadth of the evidence base.
 - Descriptive and causal research serve different purposes; understanding their differences is essential for applying research to real-world problems.
 - Ethical considerations are integral to all research with humans, with historical cases driving current safeguards and procedures.
 - Researchers and practitioners should be literate in methods to be effective consumers, commissioners, and conductors of research.
 Secondary Data: Overview
 - Secondary data are data that already exist because they were collected for a prior administrative or research activity.
 - Importance in social and policy research: low-cost computing/storage and the Internet make secondary data widely available and useful.
 - Examples from flu tracking:
 CDC uses administrative data: hospital ER visits for flu symptoms, OTC cold/flu medication purchases, vaccine uptake from surveys.
 Real-time indicators include Google query searches about flu symptoms (Ginsberg et al., 2009).
 - Big Data context:
 Big Data refers to vast stores of qualitative data (texts, images, audio, video) and quantitative data (tax records, spending, etc.), often linked and analyzed to solve problems.
 Real-world Big Data examples: CDC’s Google Flu Trends; NYC’s sewer mapping to detect illegal grease dumping by restaurants (Feuer, 2013).
 - Learning goal of chapter: understand sources and types of existing data and how computing advances turn qualitative data into quantitative data; contrast with primary data collection (to be covered in the next chapter).
 - Real-world relevance: secondary data underpin many policy analyses and program evaluations; they enable large-scale, cost-effective research.
 Big Data and the Virtual World
 - Our digital lives generate vast data streams across contexts: shopping, socializing, studying, collaborating online.
 - Much nonvirtual activity leaves electronic traces: medical tests, search terms, YouTube videos, etc.
 - Much of this data is qualitative (texts, images, audio, video). Examples: emails, blogs, tweets, webpages, government and organization documents.
 - Numerical/quantitative data also proliferate: taxes, spending, school attendance, crime, economic indicators; private sector data include financial transactions, inventory, payroll, prices, stock values.
 - Big Data enables problem-solving through data integration and analysis across sources.
 - Practical examples:
 Google Flu Trends as an instance of Big Data in public health.
 New York City example using Big Data to identify non-compliant grease disposal by linking sewer data to restaurant data (Feuer, 2013).
 - Takeaway: learning about data sources and structures is the first step toward leveraging Big Data in policy analysis.
 Quantitative Data—and Their Forms
 - Definition: Quantitative data are information recorded, coded, and stored in numerical form. Data is plural; dataset is singular.
 - Source typically involves measurement, though not always strictly; statistics provides tools for analyzing quantitative data.
 - Quantitative data can be of two broad kinds:
 Quantitative variables: Numerically observable things, primarily used for data analysis. Numeric measurements representing quantities (e.g., age in years, income in dollars). Examples include counts (ER visits, Google searches) and monetary amounts. These are inherently numeric dependent variables.
 Categorical variables recorded as numbers: Many categorical variables can be converted into quantitative forms for analysis. e.g., gender 1 = male, 2 = female; happiness 1 = very happy, 2 = somewhat happy, 3 = not happy; region 1 = Northeast, 2 = Midwest, 3 = South, 4 = West. Note: Categorical variables are quantitative data because they can be sorted, counted, summarized, and analyzed; many statistical methods address categorical data as well (Agresti, 2007). Example 1: The number of years of education is numerical, but can categorize for other variables like income and health status. Example 2: Gender can be numerically coded (e.g., Male = 1, Female = 2) for use in statistical analyses like regression.
 - Qualitative data can be automatically coded into categories (qualitative \rightarrow quantitative): example—Twitter messages with hashtags (#) indicating topic; hashtag counts yield quantitative trends.
 - Data forms/structures depend on aggregation level and time dimension (Table 6.1 referenced):
 Forms include microdata, aggregated (ecological) data, single-measure data, and multilevel data.
 - Real-world linkages:
 Administrative records, surveys, and other sources can generate both quantitative and coded qualitative data.
 Data Management: Then and Now
 - Administrative data originate from MIS (management information systems): financial records, employee records, production records, client records, performance indicators.
 - Paper vs electronic storage varies; much data still in paper form, though electronic medical records growing.
 - Converting to research-ready form often requires flattening relational databases to flat-file formats for statistical analysis.
 Relational databases contain multiple linked tables; statistical analysis generally uses flat files where rows = units of analysis and columns = variables (Figure 6.4).
 - Data cleaning is a crucial step: verify/correct/code fields, handle string data (e.g., “flu” vs “FLU” vs “influenza”); fix spelling variations; address inconsistent entries.
 - Data quality issues: inaccurate, incomplete, or inconsistent cases; range checks; logical consistency across variables; duplicate removal.
 - As Big Data expands, data cleaning becomes even more important.
 - Privacy and ethics: nonidentifiability (de-identification) is essential when releasing administrative data; avoid linking identifiers (names, SSNs) while enabling linkage via secure processes.
 - Flattening and integration challenges: turning relational data into flat files is time-consuming but often necessary; advanced analyses may require multi-level or panel data structures.
 Administrative Records and Their Preparation for Research
 - Administrative records commonly used for policy research:
 Financial and expenditure records; employee records; output production records; client records (patients, students); performance indicators.
 Stored in MIS; some still on paper.
 - Adapting administrative data for research involves:
 Data cleaning and formatting into flat-file structures suitable for software like SPSS, SAS, or Stata.
 Verifying and recoding variables; ensuring numeric formats; handling missing data.
 Potentially linking datasets across programs (e.g., Medicare, Social Security, cancer registry, death certificates) to address broader questions; requires security and de-identification.
 - Data management challenges:
 Paper records present retrieval and coding challenges.
 String formats require numeric coding (e.g., illness type coded as 23 for influenza).
 Duplicate entries and inconsistent field types must be addressed.
 Data cleaning tasks are time-consuming and often more work than the statistical analysis itself.
 - Flattening relational databases: the process of converting multi-table databases into flat, cross-sectional data suitable for analysis; multi-level analyses may require combining units of analysis across levels.
 - Administrative data privacy and ethics:
 HIPAA regulates health data privacy; data must be nonidentifiable when used for research.
 Access often requires secure procedures; data may be matched across sources with identifiers removed before release.
 Protecting privacy becomes more challenging as data linkage increases; privacy scandals highlight the need for careful governance.
 - Data sources that produce published tables (aggregate data) vs. microdata:
 Published aggregate tables (e.g., poverty by state) are easier to access but lack detail of microdata; larger studies may rely on microdata with proper governance.
 Where Do Quantitative Data Come From?
 - Quantitative data come from diverse sources and methods:
 Administrative record data, commercial transactions, sample surveys, and linked data across sources.
 The Internet-era data include web traffic, search terms, online transactions, etc.
 - Creative researchers often combine/link data from multiple sources to create richer variable sets.
 - Administrative data continue to be central to program evaluation and policy analysis; increasingly, vendors provide commercially purchased data (e.g., pharmaceutical sales, bankruptcy data).
 - Public/private data linkages and data fusion are common, including GIS-based linking by geography.
 - Data quality/compatibility issues persist across sources; data cleaning and matching are essential steps before analysis.
 Administrative Data: Data Availability, Ethics, and Public vs. Private Data
 - Public and private organizations collect administrative data; data may be used for research with appropriate approvals.
 - Ethical considerations:
 Administrative records often contain private information not collected with consent for research; use is regulated.
 HIPAA governs health information sharing; researchers must follow strict privacy rules.
 A key ethical obligation is nonidentifiability: data released should not enable identification of individuals.
 When linking confidential datasets, an approved process allows researchers to receive stripped data (identifiers removed) to preserve confidentiality.
 - Data access and privacy in practice:
 PUMS (Public Use Microdata Samples) are de-identified microdata with restrictions on geography to protect privacy.
 Census Research Data Centers (RDCs) provide secure facilities for accessing confidential microdata.
 - Data purchasing and licensing:
 Some data are purchasable (e.g., IMS Health pharmaceutical sales) and may come from publicly available sources or proprietary providers.
 Commercial vendors add value by cleaning/formatting data for research use.
 - Ethical questions for researchers:
 How to access and use data while preserving privacy and complying with legal restrictions?
 How to balance public value against individual privacy concerns?
 - Practical exercise prompts:
 Identify administrative data you might have access to at work or school; describe storage, access, and ethical issues.
 Published Data Tables and Data Archives
 - Many agencies publish aggregate data tables online; these are accessible and useful for macro-level analyses.
 - Example: Table 6.2 Aggregate Data Table (U.S. Census Bureau) showing poverty by state, with counts (in thousands) and percentages; notes explain data universe changes (e.g., pre- vs post-2006 ACS universe).
 - Important notes when using published data:
 Always read notes/documentation to understand variables, sources, years, and calculations.
 Properly cite the data source and download date since data can be updated or corrected.
 - Published data tables can be used for aggregate panel studies but lack micro-level detail.
 - The process of assembling data from published tables often involves carefully documenting definitions and sampling frames.
 Where to Find Published Tables and Data Archives
 - Data archives and portals include:
 ICPSR (Inter-university Consortium for Political and Social Research) – University of Michigan
 GESIS (German Social Science Infrastructure Service)
 Roper Center Public Opinion Archives – University of Connecticut
 SDA (Survey Documentation and Analysis), UC Berkeley
 CESSDA (Council of European Social Science Data Archives)
 UK Data Archive (UKDA)
 - Data archives store microdata and documentation to facilitate reuse; access may require registration or formal agreements.
 - Ethics of public-use microdata:
 Public-use microdata reveal individual responses; risk of re-identification exists, so many archives top-code sensitive values and limit geography.
 RDCs provide controlled access for researchers needing more detailed data.
 - Public-use microdata examples:
 CPS (Current Population Survey)
 NHIS, NHANES, BRFSS, YRBSS, MEPS, NAEP, NELS, NHES, NAAL, PSID, NLSY, NCVS, AHS, HRS, SIPP, ANES, Eurobarometer, WVS, ISSP, ESS, and more.
 - Access tools and online analysis:
 WEAT (Web Enabled Analysis Tool) for online analysis of certain datasets; NAEP Data Explorer; DataFerrett (DataWeb) for CPS
 - Public-use microdata vs. restricted data:
 Some microdata are publicly downloadable; others require application or restricted access due to privacy.
 Public Use Microdata: Examples, Access, and Ethics
 - Public-use microdata are provided by major surveys in many policy areas (education, health, labor, crime, housing, politics).
 - Data access options:
 Downloadable public-use microdata
 Online analysis tools (WEAT, NAEP Data Explorer, GSS Nesstar, ANES public data tools)
 Some datasets provide both microdata and aggregated tables
 - Major surveys (Table 6.4 overview):
 Health: NHIS, NHANES, NHCS (Healthcare), BRFSS, YRBSS, MEPS
 Education: NAEP, NELS, NHES, NAAL, CPS (labor/economics aspects)
 Labor and employment: SIPP, PSID, NLSY79/97, etc.
 Crime and housing: NCVS, AHS, HRS (retirement), PSID families
 Political/social attitudes: GSS, ANES
 International: Eurobarometer, WVS, ISSP, ESS
 - Data documentation and codebooks are essential for understanding how data were collected, coded, weighted, and analyzed.
 - Ethics of public-use microdata:
 Even when data are anonymized, detailed microdata can pose privacy risks when linked with other data.
 Access often requires confidentiality agreements and may restrict certain geographic detail (e.g., PUMS geographies).
 Secondary Qualitative Data and Linking Data
 - Secondary qualitative data reuse: UK Data Archive and other repositories archive qualitative data (interviews, focus groups, narratives) for secondary analysis.
 - Ethical considerations for secondary qualitative data:
 Original consent may not cover secondary uses; possible mismatch with new research questions.
 Additional consent or fit within the language of the original consent is often required.
 - Qualitative data can be linked with administrative data or aggregate data to create richer analyses; examples include linking interview data to neighborhood characteristics or to survey results with area-level data via GIS.
 - Linking data across sources is a core feature of Big Data research, enabling richer variable sets and more powerful analyses.
 Some Limitations of Secondary Data
 - Availability biases: some data are more readily accessible than others; e.g., elderly populations under Medicare receive centralized data; younger populations may be harder to study due to dispersed data sources.
 - Data availability can distort research questions: researchers may study topics that are well-covered by available data rather than the most important questions.
 - While data access is expanding, choice of data sources can constrain analyses and require compromises.
 When to Collect Original Data?
 - Reasons to collect primary data include:
 Small-area studies requiring data at city/neighborhood scales not available in microdata or publicly available data.
 Need to measure variables not captured in existing data; missing or insufficient variables.
 Desire for a specific combination of variables or up-to-date data.
 Privacy/confidentiality restrictions limit reuse of existing data.
 - Next chapter focuses on collecting primary data (surveys, observations).
 - Conclusion: Chapter summarizes the sources and uses of quantitative data, highlighting the need to understand data provenance and quality for policy-relevant research.
 Chapter Resources and Key Terms (Chapter 6)
 - Key terms to review (selected):
 Aggregate (ecological) data, Big Data, Codebook, Cross-sectional data, Data archive, Data cleaning, Flat file, Longitudinal data, Metadata, Microdata, Multilevel (hierarchical) data, Nonidentifiable, Online data analysis tool, Panel data, Pooled cross sections, Prospective cohort, Quantitative data, Relational database, Time series, Unit of observation, and more.
 - Exercises focus on identifying data forms, sources, and methods for using secondary data, as well as exploring public-use microdata and online analysis tools.
 Surveys and Other Primary Data
 - Chapter 7 introduces the collection of primary data through surveys and observation, plus other primary data sources (trained observation, instruments, experimental data).
 - Context: US economic indicators in 2010 provide a backdrop for the central role of surveys in understanding the economy (e.g., unemployment, consumer sentiment; 9.9% unemployment in April 2010; 7 in 10 adults believed the country was on the wrong track).
 - Surveys underpin knowledge about health, housing, crime, transportation, education, and more; they guide public policy and organizational management.
 - Core questions before conducting a survey:
 Do you know enough about the topic to design questions? If not, qualitative methods (focus groups) may be needed first.
 Does the information exist in another source? Avoid duplicating data collection if data already exist.
 Can people provide the information you want? Some data may be hard for respondents to recall or measure accurately.
 Will people provide truthful answers? Especially for sensitive topics.
 - Steps in the survey research process:
 Identify the Population: Clearly define the target group for the survey.
 Develop a Questionnaire: Create a set of questions designed to elicit the desired information.
 Pretest Questionnaire: Test the questionnaire with a small group similar to the target population to: Ensure the wording is unambiguous and clear. Validate the questions, ensuring they measure what they intend to measure.
 Recruit and Train Interviewers: If applicable, select and equip interviewers with the necessary skills to avoid biasing interviewees.
 Collect Data: Administer the survey to the identified population.
 Analyze and Present Findings: Process the collected data and communicate the results.
 - Modes of survey data collection:
 Intercept Interview Surveys: An interview technique known for constant feedback and generally high response rates. Typically conducted in high-traffic public spaces such as shopping malls.
 Household Interview Surveys: In-person interviews conducted in the respondents' homes. Often high-quality data; advantages include high response rates; disadvantages include time, cost, logistical clustering, potential social desirability bias, and interviewer effects; use of CAPI/CASI to improve administration and confidentiality. Common for sensitive topics like income or health.
 Telephone Interview Surveys: Conducted over the phone, often for customer satisfaction or public opinion polling. Fast and cost-effective; use RDD for sampling; CATI for live data capture; high effort required to reach respondents; declining response rates; BRFSS as a major US example.
 Automated Telephone Surveys (IVR/robocalls): Surveys where questions are asked and responses collected using automated voice systems. Very low cost but often low response rates; best for short, simple questionnaires; risk of leading questions and push polls.
 Mail Self-Administered Surveys: Questionnaires sent via mail for respondents to complete and return on their own. Cost-effective, especially for limited literacy populations with complete sampling frames; Dillman’s Tailored Design Method (TDM) emphasizes multi-contact strategies and careful design to boost response rates; limitations include literacy requirements and potential skip/response issues.
 Group Self-Administered Surveys: A practical and cost-effective method where surveys are completed by a group of respondents simultaneously, often in a classroom or meeting setting. Administered in settings like schools/worksites; advantages include efficiency and supervision; disadvantages include clustering effects and potential response bias due to group dynamics; YRBSS as a prominent example.
 Web/Internet Surveys: Rapid, low-cost, flexible; best when you have an established email list or use opt-in panels; can incorporate sophisticated skip patterns and multimedia; drawbacks include spam fatigue, panel attrition, duplicate responses, and tech issues across devices. Surveys administered online, typically via email invitations or website links.
 Establishment Surveys: Surveys focused on collecting data from companies and/or organizations rather than individuals. Involve multiple potential respondents within an organization; mixed-mode approaches may be used to reach diverse respondents; gatekeepers can make contact challenging.
 Panel or longitudinal surveys: track the same respondents over time; challenges include attrition and maintaining contact information; use incentives and careful tracking to reduce loss to follow-up.
 Mixing modes: mixed-mode surveys can improve coverage but introduce mode effects (responses may differ by mode), sample frame incompatibilities, and interpretation challenges.
 - Crafting a questionnaire: start with purpose/constructs; consider one or two essential questions; use mock tables to design questions that will yield needed analyses; replicate questions from established surveys where possible to enable comparability.
 - Opening questions matter: start with simple, relevant questions to engage respondents; sensitive or complex questions should be placed later; avoid early hard questions (e.g., income) to prevent dropout.
 - Closed-ended vs open-ended questions: balance; open-ended questions provide rich data but are time-consuming to analyze; excessive open-ended questions often lead to discarded data.
 - 19 principles for writing survey questions (Dillman, 2007):
 Use simple words; avoid jargon.
 Be concise; use complete sentences.
 Avoid vague quantifiers; use precise response options.
 Avoid overly detailed or unrealistic recall requests; provide bounded ranges.
 Ensure equal numbers of positive/negative response options when using scales.
 Distinguish undecided/neutral options; consider explicit neutral categories.
 Present balanced wording to avoid bias in response options.
 State both sides of attitude scales in the stem (e.g., “satisfied or dissatisfied” rather than “satisfied” only).
 Use mutually exclusive response categories; avoid overlapping categories.
 Use cognitive design to aid recall (priming).
 Give appropriate time referents (e.g., “in the last 7 days”).
 Ensure the question is technically precise (avoid ambiguity like “Do you own your home?” which might omit mortgages).
 Use standardized questions where possible to enable comparisons.
 Avoid yes/yes double negatives; avoid double-barreled questions.
 Avoid unnecessary calculations for respondents; do calculations in analysis mode when possible.
 Ensure response categories cover realistic distributions (update ranges to reflect current distributions).
 Consider the sequence of questions to minimize fatigue and bias.
 Pretest to identify problems and revise accordingly.
 - Physical and graphical design: layout, instructions, navigation aids, shading, and cross-device consistency for web surveys; pretesting recommended.
 - Ethics of survey research:
 Informed Consent: Ensuring participants understand the survey's purpose, risks, and benefits before agreeing to participate. A fundamental principle but implemented differently across modes; tacit consent often occurs in online surveys.
 Pushing for High Response Rate: Balancing the need for sufficient data with avoiding undue pressure on potential respondents. High response rates are desirable but should not involve coercion or deception.
 Overburdening Respondents: Designing surveys that are not excessively long or demanding, respecting respondents' time and effort.
 Protecting Privacy and Confidentiality: Safeguarding personal information and ensuring responses cannot be linked back to individual participants. Anonymize data; use codes to separate identifiers; manage geocoding to avoid precise location disclosure.
 Surveying Minors and Other Vulnerable Populations: Implementing additional safeguards and obtaining appropriate permissions when surveying individuals who may be more susceptible to coercion or harm. Special considerations for surveying vulnerable populations (children, prisoners, cognitively impaired, etc.).
 Making Survey Data Available for Public Use: Considering the ethical implications of sharing data, including anonymization and data security. Public-use data require restrictions to limit identification; data-sharing must balance research value with privacy.
 - Geocoding and linking data raise privacy concerns; precise locations can enable identification; use aggregated location data when possible.
 - Other primary data sources:
 Trained observation (quantitative coding of observed conditions; e.g., Sampson & Raudenbush’s Chicago neighborhoods project; street cleanliness scorecards in NYC using photographic standards).
 Use of handheld devices to capture qualitative and quantitative data (e.g., ComNET project).
 Scientific instruments (biometric readings, lab tests, brain imaging like fMRI/PET/EEG).
 Data extraction algorithms and web crawling for extracting data from the Internet; Big Data methods (trawling) expand data sources beyond traditional surveys.
 - Conclusion: Surveys are central to primary data collection but are not the only method; other primary data sources complement surveys in policy research; the next steps focus on statistical analysis and interpretation of collected data.
 BOXES, BOX 7.2, BOX 7.3, BOX 7.4, BOX 7.5 (Key Guidance)
 - BOX 7.2: Opening questions should be engaging and directly related to the topic; avoid early burdensome questions; examples compare two opening designs.
 - BOX 7.3: Example of an open-ended questionnaire that can be time-consuming to analyze; use sparingly and plan for qualitative analysis if used.
 - BOX 7.4: Critical questions to ask about surveys and other primary data (scope, mode, who conducted the survey, question wording, availability of questions, etc.).
 - BOX 7.5: Practical tips for doing your own survey (avoid duplicating existing data, mock tables, replicate standardized questions, write a purpose statement, pretest, etc.).
 Exercise and Study Site Reminders
 - Exercises in the chapter encourage identifying data sources, choosing survey modes, designing questionnaires, and exploring online data tools.
 - The study site (www.sagepub.com/remler2e) offers a self-quiz, eFlashcards, and additional resources to reinforce learning.
 Summary of Key Themes
 - Secondary data are foundational for policy research due to accessibility and cost advantages, but require careful attention to data provenance, quality, and ethics.
 - Big Data expands the potential to link diverse data sources (administrative, microdata, qualitative data) to generate richer insights, while also raising privacy concerns.
 - Quantitative data come in multiple forms and time dimensions; understanding units of observation vs. units of analysis and the time structure (cross-sectional, panel, time series) is critical for proper analysis.
 - Administrative data require substantial preparation (cleaning, formatting, de-identification) but remain a powerful source due to breadth and cost savings.
 - Public-use microdata and data archives democratize access to large, high-quality data sets, but come with ethical constraints and documentation requirements.
 - Primary data collection (surveys and other methods) remains essential when data do not exist or are not fit for purpose; choosing the right mode, designing rigorous questionnaires, and attending to ethics are crucial for valid results.
 - The field increasingly relies on mixed-methods and mixed-mode approaches to cover diverse populations, while being mindful of mode effects and sampling frame compatibility.
 Key Formulas and Notation (LaTeX)
 - Rate example (per 100,000 inhabitants):
 \text{rate} = \frac{\text{number of events}}{\text{population}} \times 10^5
 - Poverty data notes (Table 6.2): units are in thousands (1,000s); e.g., United States: 33,311 (2000) to 42,868 (2009) below poverty (in thousands).
 - Data types shorthand:
 Microdata: individual-level observations
 Aggregated data: summarized by larger units (e.g., state-level averages)
 Panel data: repeated measures on the same units over time
 Time-series: measurements over time for a single or few units
 Generalizability and External Validity
 - Generalizability: the extent to which findings from a study can be projected to a larger population, time period, or different contexts.
 - External validity is another term for generalizability (Shadish, Cook, & Campbell, 2002).
 - In practice, researchers care about what the study implies for the broader world, not just the specific sample.
 - Katrina example: CBS poll of 725 adults was used to infer thinking of a few hundred million people; the value lies in broader implications, not the exact individuals polled.
 Population, Sampling Frame, and Generalizability
 - Population of interest: the entire group the study aims to learn about (e.g., all U.S. adults in the Katrina poll).
 - Sampling frame: a concrete list or operational representation from which the sample is drawn (e.g., voter lists, phone numbers, organizational rosters).
 - Parameters vs statistics: the study aims to learn about population characteristics (parameters); the sample yields statistics that estimate these parameters.
 - The closer the sample’s results are to true population parameters, the more generalizable the results.
 - Broader population, geography, time, and groups increase generalizability; studying many places and times tends to improve external validity.
 - Random (probability) sampling tends to be more generalizable than nonrandom sampling; small random samples often beat large nonprobability samples for generalizability.
 - Examples: Katrina poll (random sample) vs Red Cross shelter study (convenience/nonrandom sampling) – both informative but with different generalizability limits.
 - Question: which features of a sample affect generalizability? This is a prelude to later sections.
 Are Experiments More Generalizable?
 - Many biological, psychological, or economic processes are fairly universal, making findings generalizable even from small, unrepresentative samples (especially in controlled experiments).
 - Experiments are used to determine causal relationships ("what if" questions).
 - Examples:
 Drug/medical trials often use clinical volunteers (generalizability can be limited).
 Psychological experiments often use undergraduates and still yield generalizable laws of perception, cognition, and behavior.
 Experimental economists study altruism and risk aversion with small samples (e.g., ultimatum game, prisoners’ dilemma).
 - Caveat: generalizability is not guaranteed; experiments can be criticized for homogeneous or idiosyncratic samples and limited external validity.
 Replication and Meta-Analysis
 - Replication: repeating a study with different samples, places, times, or designs to test robustness and generalizability.
 - Replication enhances generalizability of findings from small or nonrandom samples.
 - Meta-analysis: pooling multiple studies to produce a larger, more generalizable estimate of a treatment effect or relationship.
 Formal definitions: meta-analysis combines separate effects into a single, generalizable estimate.
 Examples: air pollution and daily mortality across many cities; second-generation antipsychotics efficacy across 124 experiments (Davis, Chen, & Glick, 2003).
 - Applications: health, education, social work, criminal justice, job training, etc.
 Relationships and Generalizability (Health and Happiness in Moldova)
 - Relationships among variables (not just descriptive percentages) tend to generalize better.
 - World Values Survey data: Moldova (small Eastern European country) has means and correlations similar to global patterns in health and happiness, despite Moldova’s low GDP per capita.
 - Global correlation (health vs happiness): \approx $<ul><li>All nations:$ \text{corr(Health, Happiness) across all nations} \rightarrow 0.333 $</li><li>Moldova (n$ \approx 974 $):$ \text{corr(Health, Happiness)} \rightarrow 0.333 \text{ (similar to global)} $</li><li>Nigerians (n$ \approx 2,021 $):$ \text{corr(Health, Happiness)} \rightarrow 0.359 $</li></ul></li></ul><ul><li>Implication: even small country data can yield generalizable insights about broader relationships; experiments designed to test causal theories can exhibit good generalizability when focusing on relationships.</li></ul><h6>Generalizability of Qualitative Studies</h6><ul><li>Qualitative research often uses nonprobability, small samples and is not generalizable in the statistical sense.</li><li>However, qualitative work can generate generalizable theories about how and why things happen (hows and whys).</li><li>Key idea: deep, longitudinal observation can reveal universal features of a setting beyond surface-level specifics, given time, effort, openness, and researcher judgment.</li></ul><h6>Basic Sampling Concepts</h6><ul><li>Sampling: selecting people or elements from a population for inclusion in a study due to limited resources/time.</li><li>Population, sample, and inference:<ul><li>Population: the entire group of interest.</li><li>Sample: a subset actually studied.</li><li>Inference: drawing conclusions about the population from the sample.</li></ul></li></ul><ul><li>Population vs sampling frame vs units:<ul><li>Population: who we want to learn about.</li><li>Sampling frame: operational representation of the population (the list or method used to select the sample).</li><li>The closer the frame fits the population, the better the potential inference.</li></ul></li></ul><ul><li>Census: data on the entire population; sampling involves selecting a subset when a census is impractical.</li><li>Examples: air quality in a city (airshed); reimbursements in an organization; exits polls outside polling stations.</li><li>Steps in sampling (summary):<ul><li>Define the population of interest.</li><li>Identify a sampling frame representing that population.</li><li>Select a subset (randomly, ideally) from the frame.</li><li>Contact sampled units and request participation.</li><li>Record responses/observations.</li><li>Summarize findings and infer about the population.</li></ul></li></ul><h6>How Large Does My Sample Need to Be?</h6><ul><li>Precision matters: larger samples reduce random error (increase precision).</li><li>Important considerations:<ul><li>If you plan to analyze subgroups (e.g., men vs women), the subgroup sample size matters for precision.</li><li>For a given precision, sample size does not depend on population size; larger populations don’t require larger samples to achieve the same precision.</li></ul></li></ul><ul><li>Quick guidelines introduced (to be formalized later):<ul><li>Larger desired precision$ \rightarrow $larger sample size.</li><li>Subgroup analyses require larger overall samples to ensure adequate subgroup sizes.</li><li>The census (complete population) is only practical for small populations or readily available frames; otherwise sampling is preferable.</li></ul></li></ul><ul><li>Practical note: final achievable sample size often constrained by time/resources; researchers must trade off precision for feasibility.</li></ul><h6>Problems and Biases in Sampling</h6><ul><li>Two major difficulties in achieving random/representative samples:<ul><li>Coverage: whether the sampling frame adequately covers the population of interest.</li><li>Nonresponse: whether selected units actually respond/participate.</li></ul></li></ul><ul><li>Sampling bias: systematic differences between the sample and population caused by problems in the sampling process.<ul><li>Representational bias can arise when certain groups are overrepresented. An example of Sampling Bias: Trying to measure the average height of a population but only picking people who are taller; this means your sampling mean will be biased.</li></ul></li><li>Distinction: sampling bias (systematic) vs precision (random error).</li><li>Coverage bias vs nonresponse bias:<ul><li>Coverage bias arises when the sampling frame misses parts of the population or includes ineligible units.</li><li>Nonresponse bias arises when those who respond differ on the outcome of interest from those who do not.</li></ul></li></ul><ul><li>Coverage problems in telephone surveys: unlisted numbers; random digit dialing (RDD) helps mitigate but not eliminate coverage bias (cell-phone-only households, younger individuals).</li><li>Nonresponse issues: response rate = contact rate$ \times $cooperation rate.<ul><li>Example: 50% contact rate$ \times $70% cooperation rate = 35% response rate.</li><li>Real-world: many surveys have response rates below 50% (Pew, 2012a, as low as 9% in some cases).</li></ul></li></ul><ul><li>Nonresponse bias models (propensity to respond P, outcome Y, other variables X and Z):<ul><li>Reverse cause model: Y causes P (e.g., recycling behavior influences willingness to respond).</li><li>Common cause model: a variable Z drives both P and Y (e.g., residency status affecting both survey response and club preferences).</li><li>Separate causes model: P and Y are influenced by different factors; nonresponse may be essentially random and ignorable.</li></ul></li></ul><ul><li>Coverage bias assessment steps (Box 5.1): define target population, identify sampling frame, assess systematic differences, determine relation of coverage to outcomes, predict bias direction.</li><li>Nonresponse bias assessment steps (Box 5.1): identify nonresponse, assess differences between responders and nonresponders, determine relation of response propensity to outcome, predict bias direction.</li><li>Ethics of nonresponse: voluntary participation is essential; coercion is unethical; IRBs oversee recruitment ethics; deception to reduce nonresponse is unethical.</li><li>Nonresponse and ethics: even with nonresponse challenges, respect for participants remains paramount; researchers should be conscientious and transparent.</li><li>Sampling bias vs generalizability: sampling bias is a subset issue; generalizability depends on broader considerations like population definition and replication across studies.</li><li>Nonprobability sampling: often used in practice for practical reasons; includes voluntary, convenience, snowball, and purposive sampling.</li><li>Box 5.2: Sampling Bias definitions (sampling frame coverage, nonresponse, voluntary and other biases).</li><li>Box 5.5–5.6: Critical questions for evaluating sampling in studies and tips for conducting sampling in practice.</li></ul><h6>Ethics of Nonresponse and Nonprobability Sampling</h6><ul><li>Ethics of recruitment: informed consent, voluntary participation, avoidance of coercion.</li><li>Nonresponse is an acknowledged cost of respecting participants' rights; sometimes reduction strategies involve incentives, multiple contact modes, and follow-ups.</li><li>Transparency about response rates and potential biases is essential for proper interpretation.</li></ul><h6>Nonprobability Sampling</h6><ul><li>Why used: cost, practicality, access, or particular research aims (e.g., exploratory or qualitative work).</li><li>Types:<ul><li>Voluntary sampling: participants respond to an explicit call for volunteers (e.g., Craigslist ads). May suffer from volunteer bias.</li><li>Convenience sampling: recruit those readily available (e.g., Red Cross shelter refugees; university subject pools).</li><li>Snowball/Respondent-driven sampling: initial respondents refer others; useful for hard-to-reach populations.</li><li>Purposive sampling: select individuals with specific characteristics to study particular phenomena or theoretical categories.</li></ul></li></ul><ul><li>Internet sampling contrasts:<ul><li>Open web polls: voluntary and self-selected; often biased by self-selection.</li><li>Internet access panels: managed panels with weighting; can approximate probability samples but still vulnerable to biases.</li><li>Weighting (post hoc) can adjust for known differences, but cannot fix unobserved biases.</li></ul></li></ul><ul><li>Practical questions: open web polls tend to overrepresent highly motivated groups (e.g., gun-rights advocates in gun-control polls); internet panels may be less biased when well-designed and properly weighted.</li><li>Purposive qualitative sampling emphasizes theory-building and causal reasoning over population representativeness; sequential design may build toward generalizable causal theories.</li></ul><h6>Random (Probability) Sampling vs Randomized Experiments</h6><ul><li>Random sampling (probability sampling): select elements from a population to generalize to that population; observational in nature.</li><li>Randomized experiments: assign participants to treatments to test causal effects; goals are internal validity and causation, not representativeness of a population.</li><li>In practice, many studies use random samples for observational studies or nonrandom samples for experiments; rare to have both random sampling and random assignment in the same study.</li></ul><h6>Simple Random Sampling: Concept and Calculation</h6><ul><li>Simple random sampling: every element has an equal chance of selection.</li><li>Example: unemployment rate estimation via simple random sample of n = 400 from the labor force.<ul><li>Suppose the sample has p̂ = 22 unemployed out of 400$ \rightarrow $p̂ = 0.055 (5.5%).</li></ul></li></ul><ul><li>Sampling variability: any other sample of the same size would yield a different p̂ due to chance.</li><li>Sampling distribution: the distribution of p̂ across many samples; center tends toward the population parameter P; shape approximates a normal distribution for large enough samples.</li><li>Standard error (SE): the spread of the sampling distribution; for a proportion:$ SE = \sqrt{\frac{p(1-p)}{n}} $<ul><li>The SE depends on the population variance and the sample size.</li></ul></li></ul><ul><li>Confidence intervals (CI) and margins of error:<ul><li>A common 95% CI uses roughly two standard errors: $ \text{CI} = \hat{p} \pm 2 \times SE $</li><li>Example: if p̂ = 0.055 and SE$ \approx $0.011, then CI$ \approx 0.055 \pm 0.022 \rightarrow (0.033, 0.077) $.</li></ul></li></ul><ul><li>The concept of a sampling distribution helps explain why CI/widens or tightens with sample size.</li></ul><h6>What Is the True Sample Size? Observed vs True Sample</h6><ul><li>True sample: the actual randomly selected units from the population (the theoretical basis for inference).</li><li>Observed sample: the units actually contacted and who participated; may be smaller than the true sample due to nonresponse.</li><li>Illustrative example: a study of employer disseminated quality info cited 1,365 employees interviewed (60% of the true sample of 2,275 initially selected).</li><li>Nonresponse can substantially reduce effective sample size and bias results; even good response rates may not reflect the true population if nonresponse is systematic.</li><li>A useful caution: reported statistics in publications often refer to the observed sample; the true sampling frame and response rate are essential for proper interpretation.</li></ul><h6>Exit Polls, Coverage, and Framing Effects</h6><ul><li>Exit polls sample voters leaving polling places; coverage may miss absentee or early voters, altering the frame.</li><li>Coverage issues can lead to systematic biases in outcomes (e.g., misestimating political shares due to who is captured by the frame).</li></ul><h6>Open Web Polls vs Internet Panels: Bias and Weighting</h6><ul><li>Open web polls: high volunteer bias; people with strong opinions are more likely to respond.</li><li>Internet access panels: panelists recruited to participate in surveys; weighting can help adjust for demographic differences, but bias may persist depending on who joins the panel and why.</li><li>Mixed evidence on bias: some internet panels yield results similar to traditional probability samples; others show persistent bias.</li><li>The 2012 U.S. presidential election highlighted variability among polls depending on method; some online methods performed well, others did not.</li><li>Weighting: adjusting the sample to reflect known population characteristics (e.g., gender, region); but cannot fix unobserved biases.</li></ul><ul><li>Practical questions: open web polls tend to overrepresent highly motivated groups (e.g., gun-rights advocates in gun-control polls); internet panels may be less biased when well-designed and properly weighted.</li><li>Purposive qualitative sampling emphasizes theory-building and causal reasoning over population representativeness; sequential design may build toward generalizable causal theories.</li></ul><h6>Random (Probability) Sampling: Forms and Practicalities</h6><ul><li>Simple random sampling as the foundation; complex methods adapt to real-world constraints.</li><li>Complex sampling often requires design effects and weighting to produce valid standard errors and confidence intervals.</li></ul><h6>The Contribution of Random Sampling</h6><ul><li>Random sampling underpins major government surveys (e.g., CPS, NCVS, NHIS, NAEP) and social surveys (e.g., GSS, ANES, European Social Survey, World Values Survey).</li><li>Distinction from randomized experiments:<ul><li>Random sampling aims to estimate population parameters;</li><li>Randomized experiments aim to test causal effects between treatments and outcomes; participants are often volunteers.</li></ul></li></ul><h6>Simple Random Sampling Details and Example: Unemployment Rate</h6><ul><li>Simple random sampling: equal probability of selection for each unit.</li><li>Example calculation: estimating unemployment rate with n = 400, p̂ = 0.055.</li><li>Sampling variability and the concept of a sampling distribution justify E = margin of error and CI.</li></ul><h6>Confidence Intervals and Margins of Error (Box/Concepts)</h6><ul><li>Margin of error (MOE) and CI are based on sampling variability and the assumed sampling distribution.</li><li>A typical 95% CI uses roughly$ \pm2 \times SE $, assuming normal approximation.</li><li>Important caveats:<ul><li>MOE is approximate and depends on assumptions (e.g., p = 0.5 for the most conservative SE).</li><li>Subgroup CIs are larger when subgroup sizes are smaller.</li><li>MOE does not account for non-sampling errors (measurement error, question wording, processing errors).</li></ul></li></ul><ul><li>Interpretation example: a poll with p̂ = 0.67 and MOE =$ \pm3.1 $percentage points means 95% CI for the population proportion lies between 63.9% and 70.1%.</li></ul><h6>Sample Size and the Precision of Government Statistics</h6><ul><li>Increasing sample size tightens CIs and reduces MOE; larger samples improve precision for policy decisions but have diminishing returns (MOE decreases roughly with the square root of sample size).</li><li>Example: n = 400 versus n = 10,000; larger samples yield much narrower CIs (e.g., unemployment rate CI from roughly 5.1% to 5.9% for n = 10,000 in a similar scenario).</li><li>CPS (60,000 households) demonstrates the scale needed for national statistics; state/local precision is lower due to smaller sub-sample sizes.</li></ul><h6>How to Determine an Appropriate Sample Size (Practical Rules)</h6><ul><li>Guidelines for simple random samples:<ul><li>Decide desired precision (MOE) and confidence level.</li><li>Use the conservative formula for proportions (p = 0.5 for maximum variability): $ n \approx \frac{Z^2 \cdot p(1-p)}{E^2} = \frac{Z^2 \cdot 0.25}{E^2} $</li><li>For 95% CI (Z$ \approx $2) and E = 0.03, n$ \approx $1111 (round up for safety).</li></ul></li></ul><ul><li>If subgroups are planned (e.g., Hispanics$ \sim $17% of the population), adjust n to ensure sufficient subgroup precision:<ul><li>Required total n$ \approx $(subgroup n) / 0.17 = 1,111 / 0.17$ \approx $6,535.</li></ul></li></ul><ul><li>Population size (N) does not affect required n for a given precision in large populations; large populations require the same n as smaller populations.</li><li>If anticipated response rate is less than 100%, inflate the initial sample to achieve the desired final number of completed interviews.</li><li>In practice, sometimes a census (surveying all units in a frame) is preferable when frames are readily available and low-cost (e.g., organizational web surveys with email lists).</li></ul><h6>Problems and Biases in Sampling (Detailed)</h6><ul><li>Coverage bias: when the sampling frame misses parts of the population or includes ineligible units, leading to biased estimates.<ul><li>Telephone surveys face coverage problems due to unlisted numbers and cell-phone-only households.</li><li>Exit polls: coverage limitations may exclude absentee voters; coverage must be assessed for potential bias.</li></ul></li></ul><ul><li>Nonresponse bias: when nonresponders differ systematically from responders on outcomes of interest.<ul><li>Response rate = contact rate$ \times $cooperation rate.</li><li>Low response rates increase risk of bias; disclosure of actual response rates is critical for assessment.</li></ul></li></ul><ul><li>Nonresponse bias directions:<ul><li>Reverse cause model: Y affects P (response propensity);</li><li>Common cause model: Z affects both P and Y (confounding by a third factor);</li><li>Separate causes model: P and Y influenced by different factors; nonresponse may be ignorable.</li></ul></li></ul><ul><li>Coverage vs nonresponse bias: not all coverage problems cause bias; the relationship between coverage, response propensity, and outcome determines bias.</li><li>Open web polls vs internet panels: open polls are highly biased; panels with weighting can improve accuracy but depend on proper adjustment and non-observed biases.</li><li>Volunteer bias (nonresponse in voluntary samples) occurs when volunteers differ systematically from the target population on the study’s outcomes.</li><li>Ethics of sampling: respect for voluntary participation; use of institutional review boards (IRBs); deceptive practices to increase response rates are unethical.</li></ul><h6>Nonprobability Sampling: Details, Pros, and Cons</h6><ul><li>Voluntary sampling: exposed to volunteer bias; may attract participants with strong opinions or particular traits.</li><li>Convenience sampling: easy access but may suffer from coverage bias; useful for exploratory purposes.</li><li>Snowball sampling: useful for hard-to-reach populations; relies on referrals; may introduce bias if initial seeds are not representative.</li><li>Quota sampling: nonprobability method that fills quotas to mirror known population shares; can improve representativeness but risk of unobserved bias.</li><li>Internet sampling:<ul><li>Open web polls: high bias due to self-selection; results may not generalize.</li><li>Internet access panels: subject to bias but can be weighted; may approximate probability sampling if designed carefully.</li></ul></li></ul><ul><li>Open vs weighted internet samples: weights can adjust some known biases, but unobserved variables may still bias results.</li></ul><h6>Purposive Sampling and Qualitative Research (Deep Dive)</h6><ul><li>Purposive sampling: select individuals with specific characteristics or insights to develop theories or understand causal mechanisms.</li><li>Sequential, theory-driven design: researchers build generalizable causal theories across connected studies rather than aiming for population representativeness.</li><li>Qualitative sampling emphasizes depth, context, and the development of causal explanations, not statistical generalizability.</li><li>Potential biases in qualitative work: nonresponse or nonparticipation can still affect the breadth and depth of theory; researchers should consider who is missing and whether those differences matter for findings.</li></ul><h6>Random (Probability) Sampling: Methods and Theory</h6><ul><li>Random sampling uses chance to select population elements, enabling statistical inference about population characteristics.</li><li>Distinction from randomized experiments: the latter assign treatments; the former select samples to infer population parameters.</li><li>Practical reality: random sampling is often combined with nonrandom elements due to constraints; still, random sampling remains the best method for generalizability when feasible.</li></ul><h6>The Simplicity and Power of Simple Random Sampling</h6><ul><li>Simple random sampling provides the foundation for statistical theory and inference.</li><li>Other sampling methods (systematic, stratified, cluster, multi-stage, PPS, RDD) are variations designed for practicality but rely on the same underlying logic.</li><li>Illustrative unemployment example shows how a simple random sample can generate useful population-level estimates, while recognizing sampling variability.</li></ul><h6>The Observed Sample vs the True Sample; Reporting and Transparency</h6><ul><li>In published work, researchers may report results derived from the observed sample, with less emphasis on the true random sample size or the response rate.</li><li>This distinction matters because a low response rate can substantially bias results even if the observed sample looks representative.</li><li>Examples show how response rates and frame selection influence interpretation of results in real studies.</li></ul><h6>Systematic Sampling Methods: Systematic, Stratified, Multistage, and PPS</h6><ul><li>Systematic sampling: select every k-th element from a list, starting at a random point; often used in exit polling and field surveys.</li><li>Stratified sampling: divide population into strata (e.g., regions), sample within each stratum; improves precision when strata are homogeneous and distinct.</li><li>Disproportionate (oversampling) stratification: oversample smaller subgroups to ensure precise estimates for those groups; requires weighting to adjust analysis for population representation.</li><li>Poststratification weighting: adjust final weights after data collection to align sample distributions with known population distributions (e.g., W = P/p for female vs male samples).</li><li>Sampling with probabilities proportional to size (PPS): select units with probability proportional to their size; common in surveys where units vary in size (e.g., firms in national economic surveys).</li><li>Multistage and cluster sampling: sample geographic areas, then clusters within areas, then households, then individuals; common in large national surveys (CPS, GSS, ANES, Eurobarometer).</li><li>Design effects and intraclass correlation (rho): clustering increases SE; design effect (def) quantifies the loss of precision due to complex sampling; effective sample size$ n_{\text{eff}} $= n / def.</li><li>Complex survey corrections: use design effects or specialized software commands (Stata, SAS) to adjust SEs and CIs; essential for large public-use surveys.</li></ul><h6>Sampling with the Internet and Open Web Polls</h6><ul><li>Open web polls: strong self-selection bias; results may reflect passion rather than representative opinions.</li><li>Internet access panels: active recruitment and weighting can offer more representative results, but biases persist depending on the panel’s composition and weighting strategy.</li><li>Conflicting evidence on bias: some internet panels perform comparably to random samples; others show substantial bias.</li><li>Real-world electoral forecasts show that some online methods performed well in certain elections; the field remains debated.</li></ul><h6>Exit Polls, Coverage, and Framing: Case Studies</h6><ul><li>Exit polls illustrate multi-stage sampling: sampling stations, then selecting individuals (e.g., every 20th voter) to interview.</li><li>Coverage bias concerns arise when some voters (absentee, early voters) are not included in the sampling frame.</li><li>The choice of frame can influence measured outcomes (e.g., voting shares) due to who is included or excluded.</li></ul><h6>Box 5.1: Steps in Assessing Coverage and Nonresponse Bias</h6><ul><li>Define carefully the target population.</li><li>For coverage bias:<ul><li>Identify who is in the frame but not in the target population, and who is in the target population but not in the frame.</li><li>Assess systematic differences between those in the frame and those in the target population.</li><li>Determine whether the propensity to be covered relates to the study variable.</li><li>Use this relationship to predict the direction of bias.</li></ul></li></ul><ul><li>For nonresponse bias:<ul><li>Identify who was not contacted or refused.</li><li>Assess systematic differences between responders and nonresponders.</li><li>Determine whether propensity to respond relates to the study variable.</li><li>Use this to predict the direction of bias.</li></ul></li></ul><h6>Box 5.2: Sampling Bias Terms</h6><ul><li>Sampling bias: systematic difference between an estimate from a sample and the population truth due to the sampling design.</li><li>Subtypes include:<ul><li>Sample selection bias</li><li>Coverage bias</li><li>Nonresponse bias</li><li>Voluntary response bias (volunteer bias)</li></ul></li></ul><h6>Box 5.3: Steps in Assessing Volunteer Bias</h6><ul><li>Define the target population.</li><li>Identify who volunteers and how they volunteer.</li><li>Assess systematic differences between volunteers and nonvolunteers.</li><li>Determine whether the propensity to volunteer relates to the study outcome and predict the direction of bias.</li></ul><h6>Box 5.4: Confidence Intervals and Precision Relationships</h6><ul><li>Relationship summary: Smaller SE$ \rightarrow $Smaller MOE$ \rightarrow $Narrower CI$ \rightarrow $More precision; larger SE$ \rightarrow $Larger MOE$ \rightarrow $Wider CI$ \rightarrow $Less precision.</li></ul><h6>Box 5.5–5.6: Practical Questions and Tips</h6><ul><li>Critical questions to ask about sampling when reading studies (generalizability, population definition, frame, response rates, weighting, design effects).</li><li>Tips for doing your own sampling: define population, select frame, consider census vs sample, probability vs nonprobability, proper random sampling methods, weighting, anticipate biases, and use design effects when necessary.</li></ul><h6>Box 5.6: Tips on Doing Your Own Research: Sampling</h6><ul><li>Define population precisely; know population size.</li><li>Identify an appropriate sampling frame.</li><li>Decide between census vs sample; consider practicality.</li><li>Choose probability sampling when possible; otherwise justify nonprobability choices.</li><li>If using disproportionate or cluster sampling, plan for weights and design effects.</li><li>Estimate needed sample size and account for subgroup analyses.</li><li>Anticipate bias reduction strategies (coverage/nonresponse).</li></ul><h6>Chapter Resources and Exercises (Overview)</h6><ul><li>Key terms: Census, Cluster sampling, Confidence interval, External validity, Generalizability, Meta-analysis, Multistage sampling, Nonresponse bias, Poststratification weighting, Probability sampling, Random digit dialing, Sampling frame, Sampling bias, Simple random sampling, Stratified sampling, Systematic sampling, Weighting, etc.</li><li>Exercises (selected): evaluating bias in alumni salary survey; nonresponse and coverage in school communications; patient satisfaction sampling; evaluating parenting style studies; Internet sampling biases; credibility of online polls; writing path diagrams for nonresponse bias.</li><li>Advanced discussions include Iraqi mortality estimates, cluster sampling challenges, ethical considerations, and politics of research in controversial topics.</li></ul><h6>Katrina and Public Health Sampling in Context (Illustrative Examples)</h6><ul><li>Katrina poll (CBS, 2005): 725 adults; 77% felt federal response inadequate; 80% believed response was not as fast as it could have been.</li><li>Shelter study (Mills, Edmondson, & Park, 2007): 132 shelter residents; evacuation took 4 days on average; 63% injured; 81% separated from family; 63% directly exposed to corpses.</li><li>Takeaway: sampling can yield important nationwide insights even from small, nonrepresentative samples, but bias and limitations must be acknowledged; generalizability is a central concern.</li></ul><h6>Practical Takeaways for Exam Preparation</h6><ul><li>Always define population, sampling frame, and sample: know what the study can legitimately say about.</li><li>Distinguish between sampling bias (coverage, nonresponse, voluntary response) and precision (sampling error).</li><li>Understand the difference between descriptive findings (percentages, means) and relationships (correlations, regressions) in terms of generalizability.</li><li>Be comfortable with core formulas:<ul><li>Response rate:$ \text{Response rate} = \text{Contact rate} \times \text{Cooperation rate} $</li><li>Standard error (proportion):$ SE = \sqrt{\frac{p(1-p)}{n}} $</li><li>Confidence interval:$ \hat{p} \pm 2 \times SE $</li><li>Simple sample size for a proportion (conservative):$ n \approx \frac{Z^2 \cdot p(1-p)}{E^2} $with p = 0.5 for maximum variability and Z$ \approx $1.96 for 95% CI.</li></ul></li></ul><ul><li>Recognize when weighting, stratification, or multi-stage designs are essential and how they affect standard errors (design effect, effective sample size).</li><li>Be able to critique a study’s generalizability by examining the sampling frame, response rate, and potential biases, and to discuss ethical considerations around recruitment and consent.</li></ul><h6>Short Answer</h6><ul><li>The chapter emphasizes that generalizability is a nuanced, context-dependent goal. Random sampling generally supports external validity, but replication and meta-analysis are often necessary to establish robust, broad-based conclusions. Nonprobability samples can still contribute valuable causal or theoretical insights, especially in qualitative and exploratory research, provided researchers are transparent about limitations and biases and use appropriate methods (e.g., weighting, design corrections) where feasible.</li></ul><h6>Context and purpose of measurement</h6><ul><li>The U.S. War on Poverty (1964) spurred a large expansion of social programs (job training, Head Start, Medicare/Medicaid, housing, food stamps).</li><li>The problem: unclear how to measure poverty and know if policy is succeeding; measurement quality affects trustworthiness of research and program evaluation.</li><li>What is Measurement? A recording and systematic observation of something in the world as a number for analysis. It involves finding standardization and understanding the implications of things being measured. It turns observations into analyzable data.</li><li>Distinction: observation (qualitative) vs measurement (quantitative/structured recording).</li><li>Measurement is central to public service delivery, program evaluation, and organizational management.</li></ul><h6>What is measurement? core idea and simple examples</h6><ul><li>Definition: systematic observation and recording of a feature, resulting in a number or category.</li><li>How are Things Measured?<ul><li>Quizzes</li><li>Studies</li><li>Quantitative Measures: Numerically observable things, primarily used for data analysis.</li><li>Qualitative Measurement: Measured as written observations that are then categorized to achieve goals for what is being measured. Example: Observing rural communities and then creating characteristic variables from these observations.</li></ul></li><li>Real-world scale example: Census Bureau’s decadal population count requires years of planning, large workforces, and huge expenditure.</li><li>Measurement vs qualitative observation: core concepts (conceptualization, validity, reliability) apply to qualitative observation as well, especially in content analysis and coding.</li><li>Performance measurement: used for administrative purposes, leadership strategy, public accountability; focuses on activities, outputs, and outcomes of programs or organizations.</li><li>Basic model and road map: a basic measurement model (Allen & Yen, 1979) lays out the key elements and what the chapter will cover.</li></ul><h6>The Basic Measurement Model (conceptual overview)</h6><ul><li>Represented as:$ [construct] \rightarrow [measure] \leftarrow [Error] $</li><li>This model represents the operation of real-world measurement.</li><li>Construct (trait): the concept we want to measure; requires clear conceptualization.</li><li>Indicator (empirical measure): the observable used to measure the construct; the task of operationalization.</li><li>Real-world measures rarely capture the construct completely; measurement error is inevitable.</li><li>Validity: the extent to which the observed measure corresponds to the intended construct (construct$ \rightarrow $observed measure).</li><li>Reliability: the extent to which the measure is free from random noise (noise$ \rightarrow $observed measure).</li><li>Central claim: measurement is a balance of conceptualization, operationalization, validity, and reliability.</li></ul><h6>Errors in Measurement</h6><ul><li>What is Measurement Error? Error inherent in the measurement of a constant.</li><li>Sources of Error:<ul><li>Sample size might be insufficient.</li><li>Questions might be misinterpreted by participants.</li><li>There is always some level of error present in measurement. The goal is to make the error as small as possible.</li></ul></li><li>Types of Measurement Error<ul><li>Random/Noisy Error: Error with an average of 0; the expected value of the result is 0. Measurement that occurs unpredictably and by chance.</li><li>Systematic Error: Consistently over- or underestimating the actual result of the measurement. Often referred to as bias. Can arise from the use of the instrument itself or from the way a participant answers a question (e.g., normative social intent).</li></ul></li></ul><h6>Conceptualization: defining what we want to measure</h6><ul><li>Construct: To "operationalize the construct" means to give it life and explicitly define what your concept means. Example: What does poverty mean? Is it based on income? Does a certain level of income determine poverty?</li><li>Conceptualization: The process of explicitly defining what it is that you seek to measure. Clear, precise definition of the construct is the first step; many concepts are straightforward (e.g., count of park entrants) whereas others are not (e.g., poverty). Example: What do gender norms and gender attributes mean?</li><li>Poverty conceptualization debate: absolute vs. relative definitions; how much is enough and what is enough a function of social context and values.</li><li>Dialogue example about poverty illustrates conceptual challenges:<ul><li>Is poverty absolute (minimum income for survival) or relative (below what others have)?</li><li>Should necessities be defined by past norms (e.g., indoor plumbing in 18th century) or current standards?</li></ul></li></ul><ul><li>Value judgments and politics influence conceptualization; this complicates international comparisons (poverty, unemployment, crime, education).</li><li>Constructs often originate from policy initiatives or organizational aims (e.g., War on Poverty, customer service) and then require refinement for measurement.</li><li>Box 4.1 Is Poverty the Same Thing the World Over? contrasts Orshansky’s absolute poverty concept with European relative measures and UN multidimensional approaches; emphasizes that constructs can be policy-driven and theory-informed.</li><li>Theoretical inputs: constructs come from theory and models; theory helps identify what to measure; logic models (application of theory to program design/evaluation) also define constructs to measure.</li><li>Logic models narrative typically includes conceptualizations that must be refined into actual measures.</li></ul><h6>Conceptualization: sources and dimensions</h6><ul><li>Constructs can arise from policy language, legislation, mission statements, or management goals.</li><li>Box 4.1 emphasizes that poverty is politically charged; different regions use different conceptualizations with different thresholds.</li><li>The broader point: measurement requires translating a concept into observable indicators, a process shaped by theory, policy, and practical constraints.</li></ul><h6>Latent vs manifest constructs; dimensions</h6><ul><li>Manifest Constructs: Directly observable traits and easily measurable. Examples: Height, weight.</li><li>Latent Constructs: Less observable and not directly measurable; inferred from other measures. Example: Intelligence.</li><li>Determining manifest vs latent status is a key part of conceptualization.</li><li>Psychometrics focuses on measuring latent traits using composite indicators (e.g., multiple questionnaire items).</li><li>Dimensions: Constructs that have multiple layers or components treated as a system. Example: Health is not a simple "yes" or "no" state; it requires multiple measures to establish a standard of health and understand someone's health status comprehensively.</li><li>SF-36 health survey: eight dimensions: 1) Physical functioning 2) Role limitations due to physical health 3) Role limitations due to emotional problems 4) Energy/fatigue 5) Emotional well-being 6) Social functioning 7) Pain 8) General health</li><li>Intelligence often viewed with multiple dimensions (e.g., Stanford–Binet five dimensions: fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, working memory).</li><li>Dimensions can be turned into distinct measures or combined into a single composite measure.</li><li>Controversies often center on emphasis or exclusion of certain dimensions.</li><li>Operationalization follows conceptualization (define how to measure each dimension).</li></ul><h6>Operationalization: turning concept into measurement</h6><ul><li>What is Operationalization? Taking the concept of a construct and translating it into a concrete form of action. Refers to the specific procedure used to measure the construct itself. Relies on indicators to keep the measurement within measurable bounds. After defining conceptually, researchers specify how to measure it.</li><li>Example: poverty measurement operationalization by Mollie Orshansky.</li><li>Orshansky conceptualized poverty as absolute and defined “less than the minimum income required to get by.”</li><li>The Agriculture Department used the Economy Food Plan as a baseline for nutrition costs; observed that families spent about one third of income on food and two thirds on other necessities, thus defining threshold as three times the basic food expenditure (adjusted for family size; updated with inflation each year).</li><li>Box 4.2 outlines the operational definition of U.S. poverty:<ul><li>Baseline threshold: Economy Food Plan cost$ \times $3; adjusted for family size</li><li>Update thresholds to present-day dollars via CPI</li><li>Use CPS data on incomes and family size to determine poverty status</li></ul></li></ul><ul><li>Instruments: tools to measure constructs (e.g., breathalyzer, sphygmomanometer, thermometer). Questionnaires and coding sheets are also instruments.</li><li>Protocol: The set of rules for operating the measurement instruments. Outlines the process and procedure of what needs to be done. Provides instructions that guide the entire measurement process. Standardized procedures (e.g., standing height measurements) and trained interviewers; some fields require licensed professionals.</li><li>True Value: The standard or benchmark against which your observed measurement will be compared, based on the categorized indicators.</li><li>Proxies: Measurements that are taken on behalf of the actual concept when the concept itself is difficult or impossible to measure directly. Proxies can mislead, but can be useful when data are limited. Proxies can apply to respondents answering for others (proxy respondents) and proxy reporting.</li><li>Indicators: The attributes that place a data point into a specific category or "bin" of concept measurement as a response. Every indicator will have associated random errors. Errors can depend on the instruments used and how the participant responded. Questionnaire items that reflect latent constructs; multiple indicators used to build a scale.</li><li>Instruments, protocols, and proxies together form the operationalization of a construct.</li></ul><h6>Instruments and related operationalization details</h6><ul><li>Instruments include questionnaires, coding forms, data-extraction protocols; they help operationalize measures.</li><li>Protocols ensure consistency (e.g., how interviewers should administer measures).</li><li>Research personnel: training and supervision affect measurement quality; certain fields require specialized professionals.</li><li>Proxy respondents: common in CPS (one respondent per household reports on others’ work and income), can introduce error.</li><li>Indicators and composite measures: often used for latent constructs; combine indicators to form a scale that reduces random error.</li></ul><h6>Composite measures: scales and indexes</h6><ul><li>Composite measures combine multiple items to measure a latent construct; indicators are the items.</li><li>Benefits of multi-item scales:<ul><li>Wider content coverage</li><li>Better capture of the construct’s range and intensity</li><li>Reduction of random measurement error through aggregation</li></ul></li></ul><ul><li>Example: Rosenberg self-esteem scale (10 items); items assess agreement with statements; some items reverse-coded.</li><li>The measurement model (conceptual) shows arrows from the latent self-esteem construct to indicators, with measurement error (noise) affecting indicators.</li><li>Indicators reflect a mix of the underlying latent construct and error; multi-item scales separate construct from error using statistical techniques (e.g., confirmatory factor analysis, structural equation modeling).</li><li>Simple approach: sum items to form a scale; random errors tend to cancel out.</li><li>Response Format: Refers to how participants respond to a question or statement. Includes the range of choices and potential responses offered. Also encompasses the tools used for participants to respond. Example: Rating a question on a scale of$ 1-5 $. Example: The Big Five Inventory uses specific response formats.<ul><li>Two-response formats: Agree/Disagree</li><li>Five-point or seven-point scales: strongly agree to strongly disagree, etc.</li></ul></li></ul><ul><li>Box 4.3 discusses Likert scales and common confusions about scale terminology (Likert scale vs. individual items vs. composites).</li><li>Scales vs indexes:<ul><li>Scale: multi-item measure with highly correlated items reflecting a latent trait; items are intercorrelated (often validated as a single construct).</li><li>Index: composite measure where items are not necessarily highly correlated (e.g., Consumer Price Index).</li></ul></li></ul><ul><li>Some scholars distinguish scales (interrelated items) from indexes (aggregates that may include unrelated items).</li><li>A scale can also be a measured construct if items are ordered by intensity (e.g., Bogardus social distance scale).</li><li>Box 4.4 discusses item difficulty and Item Response Theory (IRT) as a method to relate item difficulty to latent traits; computer-adaptive testing (CAT) uses IRT to tailor item difficulty to the test-taker.</li><li>IRT elevates measurement precision by using item difficulty to gauge the underlying trait (e.g., math ability in test items).</li><li>Cat values: CAT starts with a mid-difficulty item; subsequent items adapt to the respondent’s ability; more efficient and precise.</li></ul><h6>Validity: does the measure capture the intended construct?</h6><ul><li>What is Validity? How well a measure truly represents the construct it is intended to measure and captures variation in it. Validity can be challenging to establish; Box 4.5 outlines multiple forms and nuances; not all researchers use terms consistently.</li><li>Types of Validity<ul><li>Face validity: How well the measurement appears, at its face value, to be looking for what we intended, or simply the outcome at face value. (subjective and intuitive; can be misleading, e.g., lie detector translates nervousness to lying)</li><li>Content validity: The extent to which the measurement covers the full scope of the findings and their ability to correctly summarize those findings. Does the measure cover all dimensions of the construct? (important for pain vs. health; if a pain item misses other health dimensions, content validity may be low)</li><li>Criterion-related validity: Uses empirical evidence to demonstrate how well a measure correlates with a criterion. Empirical strategies to validate measures, including:<ul><li>Concurrent validity: If measures are strongly correlated between two different measures taken at the same time, then there is strong concurrent validity. (measure aligns with an existing contemporaneous measure of the same construct)</li><li>Predictive validity: Your measurement accurately predicts a future outcome or a pattern of a concept. Example: If a job satisfaction measure is valid, those who score low on satisfaction are more likely to quit their job in the next 12 months. (measure predicts future related behavior)</li></ul></li><li>Convergent validity: Relates to other variables that are theoretically expected to be related. We expect two variables to be correlative in nature. Example: Health and age are generally expected to be correlated. (measure correlates with related constructs as theory would predict)</li><li>Discriminant validity: The opposite of convergent validity. A measure of a construct should not be correlative in nature to another construct from which it is theoretically distinct. This indicates that something else might be measured inadvertently within the construct if it is correlated to an unrelated construct. Example: If there is a supposed decrease in health correlating with a specific religion (which is a latent construct), it suggests a potential error or bias being measured rather than a true relationship between health and religion. (measure shows low correlation with unrelated constructs, e.g., health measure vs. ideology)</li><li>Nomological validity: Validity within a broader theoretical network; correlations are consistent with theory.</li></ul></li></ul><ul><li>Box 4.5 presents a list of validity forms; Box 4.6 provides an example validity study (self-reported height/weight compared to measured values).</li><li>Example: European Social Survey health measure should correlate with age (convergent validity) and show weak correlations with unrelated traits like ideology (discriminant validity); a correlation table (e.g., r =$ \minus $0.36 with age) can demonstrate convergent validity.</li><li>Validity depends on purpose: a measure valid for broad population happiness may not be valid for clinical diagnosis (Beck Depression Inventory is more detailed for clinical use).</li><li>Box 4.6 provides a formal validity study example; Box 4.10–4.11 provide questions and tips for assessing validity in practice.</li></ul><h6>Reliability and measurement error</h6><ul><li>What is Reliability? Refers to the consistency of a measure. More noise leads to less reliability. Consistency of a measure; reliability is about random error (noise) and the repeatability of measurements.</li><li>Measurement error components (classical view):$ Xi = Ti + Bi + Ni $, where:<ul><li>$ X_i $= observed value</li><li>$ T_i $= true value (the true score)</li><li>$ B_i $= systematic bias (bias)</li><li>$ N_i $= random noise (random error)</li></ul></li></ul><ul><li>Observation quality: aim for$ Bi \approx 0 $and$ Ni \approx 0 $; real measurements rarely achieve this perfectly.</li><li>Reliability vs validity: a measure can be reliable but not valid (consistent but measuring the wrong thing); a measure can be valid but not reliable (on average hitting the target but with wide dispersion).</li><li>Why Reliability Matters<ul><li>When Calculating Averages: If the sample size is large enough, random measurement errors will tend to cancel each other out. However, a high degree of random error will still lower confidence in the average.</li><li>Confidence Interval: An estimated range of variables. A statement like "95% of the time your true value will be within this specific range of the true value" is based on reliability.</li><li>Estimating Relationships: Random errors can weaken observed relationships between variables.</li><li>Classifying Individuals: Reliable measures are crucial for accurate classification.</li><li>Tracking Changes Over Time: Essential for monitoring performance (e.g., police responsiveness, hospital efficiency). These processes are often supported and affected by external factors like changes in technology.</li></ul></li><li>Box 4.7 and 4.8 introduce bias vs random error; discuss sources of bias (question wording, social desirability), and random error (calibration, data entry, respondent variability). Classical Test Theory (CTT) formalizes the decomposition of observed scores into true score, bias, and noise.</li></ul><h6>Reliability methods (how to assess consistency)</h6><ul><li>How to Test Reliability<ul><li>Test-retest reliability: Administer the same test twice to the same group; correlate results. Useful for stable traits but may be affected by learning or time-related changes. Testing the same individuals more than once to check for consistency and differences in scores.</li><li>Interrater reliability: Multiple raters/observers score the same subject; assess consistency among raters. Two or more interviewers or observers check each other's answers or observations to test for consistent questions and similar answers/interpretations.</li><li>Split-Half Reliability: For scales, split items into two halves and correlate; higher correlation indicates better internal consistency; Cronbach’s alpha summarizes average split-half reliability across all possible splits; alpha$ \sim$$ 0.70 often considered acceptable, but depends on use (higher stakes require higher reliability).
 - Parallel Forms Reliability: Use different but equivalent forms of a test to assess consistency across versions; important when tests must change over time (e.g., yearly standardized tests). Distribute the same or equivalent test to two different samples to measure overlapping agreements and disagreements. Example: Checking the quality of teachers and the level of interpretation agreement among students on a given task.
 - Reliability is necessary but not sufficient for validity; a reliable measure can still be biased or fail to capture the intended construct.
 - Figure 4.4 and related text illustrate reliability concepts (good vs. poor reliability); Figure 4.5 shows how increasing random error affects averages and confidence intervals; Figure 4.6 shows how reliability impacts relationships.
 - Reliability in qualitative research: validity and reliability concepts apply; intercoder reliability; code-recode reliability; qualitative validity concerns (does interpretation capture participants’ experiences consistently?).
 Validity vs reliability: a practical contrast
 - A measure can be valid but unreliable, or reliable but invalid, or both, or neither (bull’s-eye analogy in Figure 4.7):
 - Reliable but not valid: shots clustered but off-target
 - Valid but not reliable: centered on target on average but widely dispersed
 - Both reliable and valid: clustered tightly around target
 - Neither: dispersed and off-target
 - Implications for measurement in practice:
 - For job performance measures, self-reports may be reliable (consistent) but not valid (biased upward);
 - Supervisor assessments may be more valid but face reliability concerns (inter-rater differences).
 - In qualitative research, validity and reliability translate to credible, trustworthy interpretations; intercoder reliability and code validity are key concerns.
 - The chapter emphasizes that validity and reliability are context-dependent and different measures may be valid for different purposes.
 Levels of measurement, units of analysis, and data types
 - Levels of measurement (two broad types):
 - Quantitative Variables: Numbers that refer to actual quantities (e.g., age, income, hours worked, weight). Unit of measurement matters (e.g., dollars, kilograms).
 Interval: Data with ordered categories where intervals between categories are equal, but there is no true zero point (e.g., temperature in Celsius or Fahrenheit).
 Ratio: Data with ordered categories where intervals between categories are equal, and there is a true zero point, allowing for meaningful ratios (e.g., height, weight, income).
 - Categorical Variables: Numbers refer to categories; can be nominal or ordinal.
 Nominal: Data that are purely categorical, without any intrinsic order or ranking (e.g., gender, race, religion).
 Ordinal: Data with ordered categories, but the intervals between categories are not necessarily equal or meaningful (e.g., education level: high school, bachelor's, master's).
 - Important distinctions:
 - Level of measurement (nominal, ordinal, interval, ratio): affects allowable statistics.
 - Unit of measurement: the unit (e.g., dollars, kilograms) that defines the quantitative variable.
 - Unit of analysis: the object described by the measure (people, households, neighborhoods, organizations).
 - Box 4.9 clarifies unit vs level vs unit of analysis distinctions.
 - Examples:
 - Household income coded into 12 categories in ESS; although labeled in euros, this is a categorical variable (not a precise quantity) unless midpoints are used.
 - Income could be treated as a quantitative variable if midpoints are assigned to categories (midpoint approximation) or if a multi-item scale sums to a continuous score.
 Turning categorical variables into quantitative measures
 - Dummy variables (indicator variables): two-value categories (0/1) used to represent presence/absence (e.g., Employed: 0=no, 1=yes).
 - Using dummy variables for multi-category nominal variables: create a separate dummy per category (e.g., White, Black, Hispanic, Asian, Other).
 - Midpoint approximation: for ordinal measures with categories, use midpoints of ranges to approximate a quantitative score (e.g., income categories become approximate euros).
 - Multi-item scales: add up ordinal indicators to form a composite score; often treated as quantitative for analysis.
 - Endpoint scales and thermometers: 1–7 or 1–10 scales with endpoints anchored; some argue equal-interval interpretation across the scale; feeling thermometers (0–100) used for attitudes toward groups or leaders.
 - Figure 4.8 example: feeling thermometer illustrating usage of scales to quantify attitudes.
 Levels of measurement and unit of analysis in practice
 - Unit of analysis vs level of measurement interact: a variable can be categorical at the individual level but become quantitative when aggregated to a geographic level (e.g., poverty rate by census tract).
 - Poverty example: individual poverty is a binary (categorical) variable; poverty rate by tract or county is a continuous quantitative measure.
 The measurement in the real world: trade-offs and choices
 - Measurement is rarely perfect; practitioners balance validity, reliability, cost, and feasibility.
 - Costs and practicality:
 - Objective measures (clinical exams, bank records, hair/hair analyses) can be more valid but expensive.
 - Longer questionnaires increase reliability but raise respondent burden and reduce response rates; shorter measures reduce burden but may sacrifice reliability.
 - Reliability tends to improve with more indicators; multi-item scales generally provide more reliable measurements, but longer instruments increase cost and respondent burden.
 - Validity-reliability trade-off: lengthy, nuanced assessments (e.g., essays) may be valid in capturing complex constructs but less reliable due to interrater variability and scoring concerns; shorter tests improve reliability but may oversimplify constructs.
 - Established measures often preferred for reliability/validity; inventing new measures risks lower comparability across time and studies.
 - High-stakes measurement can induce behavior changes (gaming) that threaten validity; examples include test prep and coaching; public sector examples include fraud in performance-based pay schemes.
 - Multi-dimensional measures ( dashboards ) vs single headline indicators: EU and UN often use multiple dimensions; some argue for a single summary indicator for policy clarity; both approaches have trade-offs regarding comprehensiveness and comparability.
 - Measurement aggregation can obscure important differences across dimensions; a dashboard approach may be preferable for a fuller picture; aggregation weights are inherently arbitrary.
 - The chapter concludes with a call to thoughtful measurement: define concepts clearly, justify instrumentation and protocols, assess validity and reliability, and ask critical questions about measures.
 Critical questions and practical guidance (Box summaries)
 - Box 4.10: Critical questions to ask about measurement
 - What is the purpose and origin of the measure?
 - What is the conceptual definition and its dimensions?
 - How is the measure operationalized (instruments, personnel, protocols)? Is it a single indicator or a multi-item scale? Is it a proxy or proxy reporting?
 - How valid is the measure (face, content, criterion-related; specific forms like concurrent, predictive, convergent, discriminant, nomological)?
 - How reliable is the measure (evidence and strength of reliability tests)?
 - What is the level of measurement?
 - Box 4.11: Tips on doing your own research: measurement planning steps (develop conceptual definitions, search for established measures, plan operationalization, decide on single item vs multi-item scales, consider proxies, test validity/reliability, review existing literature).
 Key terms (glossary-style references)
 - Bias, measurement bias, random measurement error (noise)
 - Construct, latent vs manifest construct
 - Conceptualization, operationalization
 - Instrument, protocol, proxy, proxy respondent, proxy reporting
 - Indicator, composite measure, scale, index
 - Validity (face, content, criterion-related, convergent, discriminant, nomological)
 - Reliability (test-retest, interrater, split-half, internal consistency, Cronbach’s alpha, parallel forms)
 - Item Response Theory (IRT), computer-adaptive testing (CAT)
 - Levels of measurement (nominal, ordinal, interval, ratio)
 - Unit of measurement, unit of analysis
 - Dimensionality and multi-item scales (SF-36 dimensions; Rosenberg self-esteem scale)
 - Qualitative validity and reliability concepts (code validity, intercoder reliability)
 - Technical concepts: constructs, indicators, proxies, and the measurement model
 Connections to theory, policy, and practice
 - Measurement translates abstract policy concepts (like poverty) into observable data, enabling evaluation, accountability, and resource allocation.
 - The debate over poverty measures illustrates how theory, politics, and data availability shape measurement choices and policy implications.
 - Logic models and theory-driven measurement connect theoretical propositions to empirical tests via carefully defined constructs and indicators.
 - The balance between validity and reliability, and the choice between single-item measures vs. multi-item scales, reflect practical trade-offs in policy research and program evaluation.
 - The chapter emphasizes that measurements are always context-dependent: a measure can be valid for one purpose and not for another; the same measure can have different validity in different settings or times.
 Real-world relevance and ethical considerations
 - Measurement choices affect policy conclusions, program funding, and public accountability.
 - Cost, respondent burden, and ethical concerns

Full study

Chapter 9: Inferential Statistics – Making Sense of Significance, Confidence, and Inference

Chapter 10: Multivariate Statistics – Making Sense of Multiple Variables

How to Use These Notes on the Exam

Theories, Models, and Research Questions

What Is a Theory?

The Key Functions of Theories

Theories Generate Testable Hypotheses

Theories Focus on Modifiable Variables

Where Do Theories Come From?

Induction and Deduction; Testing Theories

Exploratory and Qualitative Research

Theories, Norms, and Values

What Is a Model?

Variables and Relationships

Inputs, Activities, Outputs, and Outcomes

Additional Issues in Theory Building

How to Find and Focus Research Questions

Descriptive vs Causal Questions

Positive vs Normative Framing of Questions

Generating Questions and Ideas

Conclusion: Theories Are Practical

Boxed Highlights, Key Terms, and Exercises

Chapter Resources and Exercises (Overview)

Secondary Data: Overview

Big Data and the Virtual World

Quantitative Data—and Their Forms

Data Management: Then and Now

Administrative Records and Their Preparation for Research

Where Do Quantitative Data Come From?

Administrative Data: Data Availability, Ethics, and Public vs. Private Data

Published Data Tables and Data Archives

Where to Find Published Tables and Data Archives

Public Use Microdata: Examples, Access, and Ethics

Secondary Qualitative Data and Linking Data

Some Limitations of Secondary Data

When to Collect Original Data?

Chapter Resources and Key Terms (Chapter 6)

Surveys and Other Primary Data

BOXES, BOX 7.2, BOX 7.3, BOX 7.4, BOX 7.5 (Key Guidance)

Exercise and Study Site Reminders

Summary of Key Themes

Key Formulas and Notation (LaTeX)

Generalizability and External Validity

Population, Sampling Frame, and Generalizability

Are Experiments More Generalizable?

Replication and Meta-Analysis

Relationships and Generalizability (Health and Happiness in Moldova)

Validity vs reliability: a practical contrast

Levels of measurement, units of analysis, and data types

Turning categorical variables into quantitative measures

Levels of measurement and unit of analysis in practice

The measurement in the real world: trade-offs and choices

Critical questions and practical guidance (Box summaries)

Key terms (glossary-style references)

Connections to theory, policy, and practice

Real-world relevance and ethical considerations