Untitled Flashcards Set

Q1: How can spatial weighting matrices be useful when thinking about bias and standard errors in spatial econometrics?;;;A1: For BIAS: Spatial weighting matrices help address omitted variable bias by capturing spatial relationships between observations. They allow inclusion of spatial lags that account for how nearby observations influence each other, preventing bias from omitted spatial factors. For STANDARD ERRORS: When observations are spatially correlated, traditional standard errors are underestimated. Spatial weighting matrices allow construction of heteroskedasticity-robust and spatially-corrected standard errors that account for geographic clustering of residuals.

Q2: What are the key considerations when choosing a cutoff distance for Conley standard errors?;;;A2: Larger cutoff distances: PROS - Better correction for long-range spatial correlation, more conservative estimates. CONS - Higher computational burden, potential over-adjustment reducing efficiency. Smaller cutoff distances: PROS - Reduces computation costs, maintains efficiency. CONS - May under-correct for spatial dependence, increasing Type I error risk. Choice should balance the expected range of spatial correlation against computational constraints and efficiency concerns.

Q3: How does the reflection problem complicate identification in spatial models?;;;A3: The reflection problem arises when trying to separately identify endogenous effects (peer outcomes) from contextual effects (peer characteristics). When group outcomes are linear in group-mean outcomes, and group-mean outcomes are linear in group-mean characteristics, these effects become perfectly collinear. Solutions include: 1) Imposing nonlinear functional forms 2) Adding exclusion restrictions on parameters 3) Using incomplete network structures where GG≠G. Without such restrictions, parameters cannot be separately identified.

Q4: Compare LASSO and Random Forest methods for economic applications.;;;A4: LASSO: Best for linear relationships, automatic feature selection, high interpretability. Suitable for policy impact studies where understanding coefficient values matters. Weakness: Cannot capture nonlinear relationships. Random Forest: Excels at capturing complex nonlinear relationships and interactions, robust to overfitting. Better for prediction tasks with many interaction effects. Weakness: Less interpretable. Example applications: LASSO for selecting control variables in policy evaluation, Random Forest for predicting labor market outcomes with complex interactions.

Q5: How does the SUTVA assumption relate to spatial spillovers in difference-in-differences designs?;;;A5: SUTVA (Stable Unit Treatment Value Assumption) requires that treatment of one unit does not affect outcomes of other units. Spatial spillovers directly violate this assumption. Example: When studying crime policy in San Francisco, if crime displaces to nearby cities, using these cities as controls violates SUTVA. Solution: Choose control group from cities far enough away to avoid spillover effects while maintaining similar characteristics for parallel trends.

I'll create more flash cards focused on additional key concepts from the study guide.

Q6: What are the key challenges in using spatial regression discontinuity when administrative boundaries overlap with treatment boundaries?;;;A6: Main challenge relates to the continuity assumption - units on either side of boundary should be similar except for treatment. When administrative boundaries overlap, there may be multiple discontinuities (e.g., in public services, regulations) making it impossible to isolate the treatment effect. Example: If redlining boundaries match current administrative borders, differences in outcomes could reflect historical redlining effects or current service provision differences.

Q7: Why is cross validation particularly important in machine learning applications, and how is it implemented?;;;A7: Cross validation helps prevent overfitting and assess how models generalize to unseen data. Process: 1) Split data into k folds 2) Train model on k-1 folds 3) Validate on remaining fold 4) Repeat k times 5) Average performance metric across folds. For LASSO, particularly useful in selecting optimal regularization parameter λ. Common metric is Mean Squared Error as it captures both bias and variance components of prediction error.

Q8: How does Google Earth Engine facilitate large-scale spatial data analysis?;;;A8: Two key advantages: 1) Cloud-based processing - allows analysis of massive satellite datasets without local downloading, utilizing Google's infrastructure for computation 2) Built-in aggregation tools - provides pre-built functions for temporal and spatial aggregation, eliminating need for manual handling of thousands of images. These features make previously intractable analyses (like 20-year US tree cover changes) computationally feasible.

Q9: What different types of spatial data formats exist and what are their typical applications in economics?;;;A9: Key formats: 1) Raster (gridded data like satellite imagery, used for nighttime lights analysis) 2) Polygon shapefiles (administrative boundaries, for analyzing regional policies) 3) Point shapefiles (business locations, for studying agglomeration) 4) Line shapefiles (transportation networks, for infrastructure analysis) 5) Administrative area tables (regional statistics) 6) Coordinate tables (geocoded microdata). Choice depends on research question and spatial unit of analysis.

Q10: How can machine learning methods help with control variable selection in empirical economics?;;;A10: ML helps when theory provides limited guidance on control variable selection by: 1) Automated feature selection through regularization (LASSO/elastic net) 2) Capturing complex nonlinear relationships without pre-specification (Random Forests) 3) Handling high-dimensional data efficiently. Particularly useful when many potential controls exist but theoretical guidance on inclusion/functional form is limited.

Q11: What are the main sources of bias in spatial models and how can they be addressed?;;;A11: Main sources: 1) Omitted spatial variables (addressed through spatial fixed effects or differencing) 2) Spatial spillovers (addressed through careful control group selection) 3) Sorting (addressed through panel methods or instrumental variables) 4) Reflection problem (addressed through network structure restrictions). Choice of correction depends on data structure and identifying assumptions researcher is willing to make.

I'll create more flash cards focusing on additional key concepts.

Q12: What considerations are important when constructing data for a spatial regression discontinuity design around enterprise zones?;;;A12: Key steps: 1) Obtain enterprise zone boundary polygons and business location data with coordinates 2) Use spatial join to assign treatment status 3) Calculate distance to boundary as running variable 4) Restrict sample to bandwidth around boundary 5) Merge with outcome data (e.g., business income). Critical to ensure precise distance calculations and appropriate bandwidth selection to maintain RD validity.

Q13: How does spatial autocorrelation affect inference in persistence studies?;;;A13: Spatial autocorrelation can severely inflate t-statistics, leading to over-rejection of null hypotheses. Even modest spatial correlation can cause nominal significance levels to differ from true levels by several orders of magnitude. This is particularly problematic when studying long-run persistence where spatial correlation is often strong. Solutions include using Conley standard errors or randomization inference.

Q14: What role can satellite data play in economic research?;;;A14: Satellite data provides: 1) Objective measurement of physical characteristics (e.g., nighttime lights as proxy for economic activity) 2) High temporal and spatial resolution 3) Coverage of areas lacking traditional data collection 4) Consistent measurement across political boundaries. Useful for studying economic development, urbanization, environmental impacts, and policy effects in data-scarce environments.

Q15: How do fixed effects compare to spatial differencing for addressing spatial unobservables?;;;A15: Fixed effects eliminate time-invariant location-specific factors but require panel data and sufficient within-unit variation. Spatial differencing removes spatially varying unobservables by comparing nearby units but requires assuming spatial correlation structure. FE better for persistent effects, spatial differencing better for cross-sectional variation. Both methods implicitly make assumptions about the structure of spatial interactions.

Q16: What challenges arise when using cluster randomization in spatial settings?;;;A16: Key challenges: 1) Reflection problem not solved by randomization alone 2) Intracluster correlation reduces effective sample size 3) Standard errors must account for clustering 4) Treatment spillovers may violate SUTVA 5) Control over both group membership and individual assignment needed for full parameter identification. May require additional structure on interaction patterns for identification.

Q17: How can you test for nonrandomness in spatial data?;;;A17: Methods include: 1) Point pattern analysis comparing to complete spatial randomness 2) Global indicators of spatial association (Moran's I, Getis-Ord statistics) 3) Local indicators of spatial association for identifying clusters 4) Kernel density estimation for visualizing patterns. Choice of test depends on data type (point vs. areal) and null hypothesis of interest.

I'll create more flash cards focusing on additional concepts from the materials.

Q18: How do endogenous and contextual effects differ in spatial models?;;;A18: Endogenous effects (β) capture how individual outcomes respond to group outcomes (e.g., peer behavior directly affects individual behavior). Contextual effects (θ) capture how individual outcomes respond to group characteristics (e.g., peer characteristics affect individual behavior). Distinguishing between these requires specific identification strategies due to the reflection problem.

Q19: What criteria should be considered when choosing between LASSO and Random Forest for a specific application?;;;A19: Consider: 1) Linear vs nonlinear relationships (LASSO for linear, RF for nonlinear) 2) Importance of interpretability (LASSO more interpretable) 3) Computational resources (RF more intensive) 4) Presence of interactions (RF better at capturing) 5) Number of irrelevant features (LASSO good at feature selection) 6) Sample size relative to features 7) Whether prediction or inference is primary goal.

Q20: How does spatial correlation affect the modifiable areal unit problem (MAUP)?;;;A20: MAUP arises because parameter estimates can change substantially with different levels of spatial aggregation. Spatial correlation exacerbates this by changing the relative weights of individual effects (γ) versus spatial interaction effects (θ) at different aggregation levels. Higher aggregation typically increases weight on spatial effects.

Q21: What are the key considerations in specifying a spatial weights matrix?;;;A21: Consider: 1) Structure (contiguity, distance-based, network-based) 2) Normalization (row standardization vs. raw weights) 3) Treatment of self-connections (zeros on diagonal) 4) Cut-off distance/number of neighbors 5) Whether GG≠G for identification 6) Economic justification for chosen structure. Choice affects both interpretation and identification.

Q22: How can you address weak instruments in spatial models?;;;A22: Strategies include: 1) Using higher-order spatial lags as instruments when network is incomplete 2) Exploiting network structure where "friends of friends" provide variation 3) Using institutional features that create plausibly exogenous variation 4) Testing instrument strength with first-stage F-statistics 5) Considering alternative estimation strategies when instruments are weak.

Q23: What role does spatial correlation play in the precision of treatment effect estimates?;;;A23: Spatial correlation: 1) Reduces effective sample size due to dependence between observations 2) Requires adjustment of standard errors to avoid Type I errors 3) Affects power calculations for experimental design 4) May invalidate traditional clustering approaches if correlation crosses cluster boundaries 5) Requires consideration of appropriate inference methods (e.g., Conley standard errors).

robot