Advanced Raster Analysis and Spatial Statistics in ArcGIS

Spatial Analyst and Raster Data Models Review

  • Raster Data Model Recap:     * Review of the raster data model and various forms it takes.     * Discussion of digital elevation models (DEMs) and other elevation representations.     * Introduction to the Spatial Analyst extension for ArcGIS, a toolbox specifically designed for analyzing and manipulating raster data.

  • Hillshades and Surface Modeling:     * Hillshades are grayscale 3D representations of a surface.     * They account for the sun's relative position to shade the image.     * The function uses specific properties to define the sun's location:         * Altitude         * Azimuth

Distance Functions in ArcGIS

  • Overview of Distance Analysis: There are two primary methods for performing distance analysis in ArcGIS: Straight line distance and Cost weighted distance.

  • Straight Line (Euclidean) Distance:     * Calculates the distance from each cell to the closest source.     * Source Definition: Identifies objects of interest such as oil wells, roads, or forest stands.     * Calculations are performed as "the crow flies" using the projection units of the raster.     * Distance is computed from the center of the source cell to the center of each surrounding cell.     * If multiple sources exist, the cell value represents the distance to the single nearest source.     * Source Format: If the source is a raster, it must contain only source values while all other cells are set to "No Data." If the source is a vector feature class, ArcGIS converts it to a raster internally during the process.     * Standalone Applications: Used for emergency flight planning (e.g., finding the nearest hospital).     * Suitability Analysis: Used for finding distances to specific features, such as a red-cockaded woodpecker cavity tree.

  • The Three Euclidean Tools:     * Euclidean Distance: Provides the distance from every cell to the nearest source.     * Euclidean Direction: Assigns a value in degrees (00 to 360360) to each cell representing the direction to the nearest source.         * A circle/compass is used: North is 360360, East is 11, and values increase clockwise.         * The value 00 is reserved specifically for the source cells themselves.         * Example: Direction from a hiker's location to the nearest town for medical evacuation.         * Example Angles: Source 1 at 1515^{\circ}, Source 2 at 135135^{\circ}, Source 3 at 320320^{\circ}.     * Euclidean Allocation: Creates a raster where every cell receives the value of the source feature it is closest to.         * Partitions the surface into zones/areas dedicated to one feature (e.g., store or hospital service areas).

Cost Weighted Distance Analysis

  • Definition: Modifies Euclidean distance by incorporating a cost factor—the effort or economic expense required to travel through any given cell.     * Example: It might be shorter to climb over a mountain, but faster (lower cost) to walk around it.

  • Cost Surfaces:     * Represent factors affecting travel like terrain slope, snow depth, or financial cost.     * Slope as a Factor: Steep terrain increases road construction costs. These values are often transformed into rank values (e.g., a common scale of 11 through 99).         * Value 11: Very low travel cost.         * Value 99: Highest travel cost.     * Ranking note: A value of 99 is not necessarily nine times more costly than 11; it is simply the most costly value in the index.     * An analysis can combine multiple cost surfaces (e.g., slope and snow depth) into a single master cost surface.

  • Cost Distance Tool Logic:     * Calculates the least cumulative cost distance for each cell to the nearest source.     * The tool evaluates neighbors starting from the source, multiplying the average cost between cell pairs by the distance between them.     * The process iteratively moves to the cell with the lowest value, evaluating unknown neighbors.

  • Cost Back Link Tool:     * Produces a direction raster showing which way to go from any cell to reach the source via the least-cost path.     * Output values range from 00 to 88.     * 00: Reserved for the source cell.     * 11: Reach the next cell by moving Right.     * 22: Diagonally to the Lower Right.     * 33: To the cell Below.     * 44: Diagonally to the Lower Left.     * This provides the sequence of cells for a "roadmap" back to the source.

  • Cost Path Generation:     * Requires both a cost-weighted distance surface and a direction surface.     * Evaluates eight neighbors at each cell and moves to the neighbor with the smallest accumulated value until the source and destination are connected.

Density Tools and Surface Spreading

  • Concept: Spreads the values of input features over a surface to show where points are concentrated.

  • Mechanism: A circular search area/radius is applied around each sample point.     * Search Radius Impact: A larger radius results in a smoother surface and spreads point values over a wider area, leading to a less dense appearing surface.

  • Population Distribution Example:     * Calculating density for town population points shows the predicted spread throughout a landscape rather than humans living at a single point coordinate.     * The sum of values in the output density cells equals the sum of the population in the original point layer.

Spatial Operation Categories

  • Local Operations:     * Simplest map algebra functions performed cell-by-cell.     * Only involves the single cell location across participating rasters (e.g., adding two rasters for a sum).

  • Focal (Neighborhood) Operations:     * Computes an output value for a cell based on neighborhood values.     * Also known as a "moving window" operation.     * Standardly results in smoothed values.

  • Zonal Operations:     * Computes output values based on "zones" (groups of cells sharing a common characteristic, such as a watershed, county, or soil series).     * Areas in a zone do not need to be contiguous (can be separate "regions").

  • Global Operations:     * Performs functions using all cells of the input raster to determine the value of each output cell.     * Example: Euclidean distance.

Statistics Tools in Spatial Analyst

  • Cell Statistics (Local):     * Calculates per-cell statistics from two or more input rasters.     * Available statistics: SUM, MEAN, MAXIMUM, MINIMUM, RANGE, STD (Standard Deviation), VARIANCE, MEDIAN, etc.     * The No Data Rule: By default, if any input cell in an expression is "No Data," the result is "No Data."         * Example: Summing 100100 rasters where one has "No Data" at a specific cell results in "No Data."         * Exception: Users can toggle the "Ignore No Data" option in the dialog box to calculate based on available values.

  • Neighborhood (Focal) Statistics:     * Uses a moving window (default is a 3×33 \times 3 rectangle).     * Available Shapes:         * Rectangle/Square.         * Annulus (donut-shaped with inner and outer radii).         * Circle.         * Wedge (section of a circle).         * Irregular (defined by a user-specified file).

  • Zonal Statistics:     * Calculates statistics (e.g., minimum, mean) for a "value raster" based on zones defined in a separate dataset.     * Tools:         * Zonal Statistics: Outputs a raster where each cell in a zone gets the calculated value (e.g., the minimum suitability value for an entire watershed zone).         * Zonal Statistics to Table: Outputs statistics to a non-spatial table.         * Zonal Histogram: Produces a table and graph of frequency distribution of cell values within each zone.

Map Algebra and Raster Calculator

  • Raster Calculator:     * The primary tool in the Map Algebra toolset.     * Uses Python syntax in a calculator interface.     * Allows the execution of complex spatial analyst tools and logical operators in a single expression.     * Highly effective for integration into Model Builder.

Reclassification Methods

  • Purpose of Reclassifying:     * Assigning Preference: Changing land use types to values representing habitat quality (e.g., Forest = 33, Pasture = 22, Residential = 11, Urban = No Data).     * Grouping Values: Merging multiple forest species codes into a single "Evergreen" class.     * Normalizing Scales: Setting multiple layers to a common scale (e.g., 11 to 1010) for suitability analysis to ensure "apples to apples" comparisons.     * Data Management: Setting specific values to "No Data" or vice versa.

  • Combining Layers with Reclassification:     * Problem: Adding two binary rasters (e.g., Summer Dry/Wet and Winter Dry/Wet where 1=extDry1 = ext{Dry} and 2=extWet2 = ext{Wet}) can cause data loss.         * Using 1+2=31+2=3 and 2+1=32+1=3 makes it impossible to tell if a cell was dry in summer or dry in winter.     * Solution: Reclassify one layer to a different magnitude (e.g., Summer values 1010 and 2020, Winter values 11 and 22).         * Results in unique totals: 1111 (Dry/Dry), 1212 (Dry/Wet), 2121 (Wet/Dry), 2222 (Wet/Wet).

Spatial Interpolation

  • Mechanism: Predicts values for unsampled raster cells based on a limited number of sample points.

  • Spatial Dependency: Interpolation assumes that closer points are more similar than distant points.     * Rainfall Analogy: High confidence that it is raining on the other side of the street if it is raining here; less confidence for the other side of town; low confidence for a different country.

  • Requirement: Requires a sufficient density of representative points to create an accurate surface; inadequate sampling leads to inaccurate predictive results.