Mapping

Introduction to Map Making in R

  • Discussion on the basics of creating maps in R.

  • R as a powerful map-making tool:

    • Future courses will explore sophisticated map-making techniques.

  • Focus on using two main map-making packages:

    • maps.

    • mapdata.

  • Incorporating tidyverse tools:

    • Primarily ggplot for displaying maps.

  • Two main topics of discussion:

    1. Basics of utilizing R’s built-in map data.

    2. Creating choropleth maps to visualize data variations across geographic units.

Drawing a Map of the United States

  • Initial goal: draw a map of the United States.

  • Steps taken:

    • Load necessary libraries (ensure they are installed first).

    • Code:
      R library(maps) library(mapdata)

  • Understanding maps:

    • Definition: A map consists of connected points delineating boundaries of geographic entities (counties, states, countries).

    • Representation: Boundaries are represented by sets of longitude and latitude coordinates.

  • Requirement: Latitude and longitude of U.S. coastline points to draw the map.

  • Data retrieval for the U.S.:

    • Code:
      R usa <- map_data("usa") head(usa)

  • Output of head(usa):

    • Variables in the dataset:

    1. long: longitude of a boundary point

    2. lat: latitude of a boundary point

    3. group: argument for ggplot indicating how to connect points (same group points connect, different group points are disconnected).

    4. order: order indication for drawing points.

    5. region: region/category of the data (e.g., state).

    6. subregion: sub-region/category of the data (e.g., county).

Plotting the Map

  • Next step: actual plot creation.

  • Code to draw the map:
    ```R
    ggplot() +
    geompolygon(data = usa, aes(x = long, y = lat, group = group)) + coordquickmap()

- Explanation of components:
  - `ggplot()` sets up the plotting environment.
  - `geom_polygon()` uses geometric shapes to create complex polygons representing geographical boundaries.
  - `coord_quickmap()` specifies the use of the Mercator projection.
    - Explanation of projections: 
      - Necessary because Earth is round, and maps are flat.
      - Mercator projection is commonly used.
    - Alternatives: `coord_map()` from **mapproj** library can be used for different projections.

# Customization Options
- Default behavior of `geom_polygon()` fills polygons with black.
- **Code to change fill to transparent:**  

R
ggplot() +
geompolygon(data = usa, aes(x = long, y = lat, group = group), fill=NA, color="black") + coordquickmap()

- Blank map scenario explanation:
  - If only `fill=NA` is used without setting `color`, the map remains unfilled & no boundary line is drawn.
  - Importance of setting `color` to visualize boundaries.

# Modifying the Background
- **Removing background lines:** The background (latitude/longitude lines) can be removed using:

R
ggplot() +
geompolygon(data = usa, aes(x = long, y = lat, group = group)) + coordquickmap() +
theme_void()

- Importance of aesthetic choices in map-making.
- Additional features can be added to adjust titles or other stylistic elements.

# Exploring Other Maps
- Besides the U.S., maps for various countries can also be made using the maps package.
- Example: Drawing a map of France.
- **Code to create a map of France:**  

R
France <- mapdata("france") ggplot() + geompolygon(data = France, aes(x = long, y = lat, group = group)) +
coord_quickmap()

- To obtain a list of countries available in the package:

R
options(max.print=2000)
maps::map("world", namesonly=TRUE, plot=FALSE)

# Drawing Maps of US States and Counties
- The package also supports state and county maps.
- **Code to draw a state map:**  

R
states <- mapdata("state") ggplot(data = states) + geompolygon(aes(x = long, y = lat, group = group)) +
coord_quickmap()

- Adjusting color differentiation of states for better visibility:  

R
ggplot(data = states) +
geompolygon(aes(x = long, y = lat, group = group), col="white", lwd=0.15) + coordquickmap()

- **Aesthetic adjustments:**
  - Emphasizes state boundaries for improved visualization.
  - Adjusts line width with `lwd`.
- Challenges in including Alaska and Hawaii in the map visualization.

# Subset Maps
- Drawing maps for a specific subset of states (e.g., West Coast states).
- **Code for Pacific Coast states:**  

R
westcoast <- filter(states, region %in% c("california","oregon","washington")) ggplot(data = westcoast) +
geompolygon(aes(x = long, y = lat, group = group)) + coordquickmap()

- Utilizes `filter` from dplyr to subset states by their region variables.

# County-Level Mapping
- Code to draw a map of counties:

R
counties <- mapdata("county") ggplot(data = counties) + geompolygon(aes(x = long, y = lat, group = group)) +
coord_quickmap()

- To visualize state boundaries alongside counties:

R
ggplot() +
geompolygon(data=counties, aes(x=long,y=lat,group=group)) + geompolygon(data=states, aes(x=long,y=lat,group=group), fill=NA,col="white",lwd=0.15) +
coord_quickmap()

- Maps can be customized to improve readability and the aesthetic of the visualizations.

# Practical Example: Pennsylvania County Map
- **Test Yourself:**
  - Create a county-level map of Pennsylvania.
  - **Answer code:**  
    ```R 
    pa_counties <- filter(counties,region=="pennsylvania")
    ggplot() + 
      geom_polygon(data=pa_counties, aes(x=long,y=lat,group=group)) + 
      coord_quickmap()
    ```

# Enhancing Maps with Data
- Importance of enhancing maps with additional data.
- Source of rich data: U.S. Census provides significant information (e.g., median income, poverty percentages).
- Dataset example: `census_poverty.csv` containing relevant socio-economic data.
- Key method to merge census data with map data lies in FIPS codes:
  - **FIPS (Federal Information Processing Specification):** Identifies counties.
- Mapping process requires:
  1. Inspecting and merging datasets via FIPS codes.
  2. Implementing `mutate` to create `polyname` variable for compatibility.

# Joining Datasets Example
- **Code showing creation of polyname and merging:**

R
countywithfips <- counties %>%
mutate(polyname = paste(region,subregion,sep=",")) %>%
left_join(county.fips, by="polyname")

- Verifying the join: `head(county_with_fips)` outputs:
  -  Variables combined include: **long, lat, group, region, subregion, polyname, fips**.

# Read and Merge Census Data
- Steps to read in and merge Census data:  
  - **Read data:**  
    ```R 
    setwd("~/Dropbox/DATA101")
    census_income <- read_csv(file="Data/Processed/Census_Poverty.csv")
    ```
  - **Merge datasets:**  
    ```R 
    county_with_income <- inner_join(county_with_fips,census_income,   by=c("fips"="fips_code"))
    ```
- Result: Combined dataset ready for mapping visualizations.

# Choropleth Mapping
- **Basic choropleth map creation:**  
  - Utilize fill aesthetic to indicate data variations:

R
ggplot(data=countywithincome) +
geompolygon(aes(x=long, y=lat, group=group, fill=pctpvty)) +
coord_quickmap()

- Explanation of visualization results:
  - Indicates geographic poverty distribution across counties.
  - **Choropleth Map Definition:** Shows how a variable varies geographically (Greek origin: area and multitude).

# additional Mapping Capabilities
- Easily create state-level or county-level maps with altered syntax for differences.
- Suggested task: use `median_income` for a different map:  

R
ggplot(data=countywithincome) +
geompolygon(aes(x=long, y=lat, group=group, fill=medianincome)) +
coord_quickmap()

# Aesthetic Adjustments in Maps
- Three common aesthetic choices:
  1. Boundaries visibility
  2. Color schemes
  3. Representations of value differences
- Adjusting the color representation by adding color mapping:

R
ggplot(data=countywithincome) +
geompolygon(aes(x=long, y=lat, group=group, fill = pctpvty, color=pctpvty)) + coordquickmap()

- This method ensures both filled area and boundary are matched visually.

# Further Customizations
- Changing the fill gradient color:

R
ggplot(data=countywithincome) +
geompolygon(aes(x=long, y=lat, group=group, fill=pctpvty)) +
scalefillgradient(low="lightyellow", high="darkgreen") +
coord_quickmap()

  - Access available colors using `colours()` command in R.

# Discrete vs Continuous Variables
- Transforming continuous variables to discrete for enhanced visual analysis.
- Example: `medinc_quart` variable divided into quartiles to highlight income variations.

# Color Palette Selection for Visualization
- Importance of choosing appropriate color schemes mentioned in color theory by Cynthia Brewer.
- Accessing Brewer’s palette using RColorBrewer:  

R
library(RColorBrewer)
display.brewer.all()

- Description of color palettes types:
  1. **Sequential Palettes**: For gradual data (low-high)
  2. **Qualitative Palettes**: For nominal/categorical data
  3. **Diverging Palettes**: For contrasting high-low values.

# Using Diverging Color Scheme
- **Diverging example for income map:**  

R
ggplot(data=countywithincome) +
geompolygon(aes(x=long, y=lat, group=group, fill=as.factor(medincquart))) +
coordquickmap() + scalefill_brewer(palette = "RdYlGn", name="County Median Income",
labels=c("Below $42,275", "$42,275 - $48,885", "$48,885 - $56,696", "Above $56,696"))
```

  • Visual assessment notes:

    • Clear geographic income dispersion indicated with distinct color gradients.

Summary and Conclusion

  • Final observations comparing divergent graphs with original graphs.

  • Potential for further analysis using the maps and additional R packages (ggmap, sf, tmap).

Appendix

  • All code utilized becomes accessible for reference.

  • Ensuring understanding is reinforced through practical application of the discussed concepts.

  • Closing thoughts: Happy map-making!