Comprehensive Guide to Geocoding in ArcGIS

Fundamentals of Geocoding in ArcGIS

  • Geocoding is the specific process of transforming a description of a location into a precise location on the earth's surface.

  • A description of a location can take several forms:     - A pair of coordinates (e.g., latitude and longitude).     - A specific street address.     - The name of a place (e.g., a park or a business).

  • Users may already be familiar with transforming tabular coordinate data into a feature class using the "add x y data" functionality in ArcMap.

  • While place names, such as "Bastrop State Park," can be geocoded, the primary focus in professional GIS applications is the transformation of tabular address data into GIS features.

  • Geocoding can be performed individually (entering one location description at a time) or in batch mode (processing many locations at once from a table).

  • The resulting output of geocoding exists as geographic features with associated attributes suitable for mapping and spatial analysis.

  • The process creates features using either:     - Geographic coordinates (latitudes and longitudes\text{latitudes and longitudes}).     - Map projections using x and y coordinates\text{x and y coordinates}.

The Logic and Utility of Geocoding

  • The geocoding process mimics how humans find real-world locations based on descriptions:     - Example: To find 474474 Olsen Blvd, College Station, TX, one would first locate the state of Texas.     - Next, find the city of College Station.     - If zip codes are available, identify the specific zip code area.     - Finally, identify the street and estimate the position of the address based on the house number.

  • Geocoding serves as a tool to narrow down geographic options within a map or the physical world.

  • Applications of Geocoded Data:     - Spatial Pattern Recognition: By converting addresses into points, patterns within information become visible through visual inspection or ArcGIS analysis tools.     - Crime Mapping: Police reports often collect addresses rather than GPS coordinates. Geocoding these reports helps visualize crime hotspots and place individual incidents in a broader spatial context.     - Customer Data Management: Organizations maintain tables with customer names, addresses, and buying habits. Geocoding this data allows for:         - Establishing marketing strategies.         - Targeting specific customer clusters.         - Producing route maps and directions.

Core Components of a Geocoder

  • Address Locator Style (Address Locator Template):     - This serves as the "skeleton" of the address locator.     - It determines the type of address that can be geocoded, defines field mapping for reference data, and dictates what output information is returned when a match is found.     - Example: The "US address dual ranges" style is used for street centerlines containing left and right address ranges.

  • Address Locator:     - This is the primary tool for geocoding within ArcGIS.     - It contains all geocoding parameters, properties, and a snapshot of address attributes from the reference data.     - It functions like a "street guide" or "map book" that directs a user to a specific page and pinpoints a location.

  • Reference Data:     - This is the GIS feature class source used to build the address locator.     - It must contain attributes such as house number ranges, street names, and street types.     - The match rate of geocoding depends heavily on the completeness, spatial accuracy, and attribute accuracy of this data.

Address Elements and Parsing Mechanisms

  • The geocoding engine breaks an address into subunits called "address elements."

  • Common Address Elements:     - House number.     - Prefix direction/Prefix type.     - Street name.     - Street type.     - Suffix direction.     - City, State, and Zip code.

  • Parsing Rules: The address locator uses defined rules to break down strings. This can lead to multiple interpretations (e.g., the word "Park" could be a street name or a street type).

  • Scoring: The locator searches all element combinations, finds possible candidates, and assigns each a score based on how well it matches. The best matches are presented based on these scores.

  • Data Consistency Requirements:     - Addresses can be a single string or split across multiple fields.     - ArcGIS performs better when street address, city, state, and zip code are in separate fields.     - Database compatible formats include:         - CSV: Comma Separated Values (text files with fields separated by commas).         - DBF: Database File (an older standard from the dBase software package).         - Excel: Spreadsheet formats.

  • ArcGIS Pro Tools: The "split address into components" tool in the geocoding tools toolbox can be used to parse street address information into unique fields.

The Geocoding Workflow

  • Step 1: Obtain and modify reference data to match locator style requirements.

  • Step 2: Select an address locator style appropriate for the data and available reference attributes.

  • Step 3: Create the address locator and specify geocoding options.

  • Step 4: Use the locator to geocode the address table.

  • Reference Data Considerations:     - Extent: The reference data must cover the entire area of interest (e.g., Texas-wide data is needed if addresses are state-wide; Brazos County data will not suffice).     - Resolution: The data must be detailed enough to find the specific target (e.g., it must have address ranges, not just street names).

US Census Bureau and TIGER Files

  • TIGER Files: Stands for Topologically Integrated Geographic Encoding and Referencing.

  • These files provide topological encoding, which allows for address finding and road network creation (routing from point A to point B).

  • Content of TIGER Files: Roads (street centerlines), railroads, zip codes, census blocks, legal/statistical areas, and demographic/population data.

  • Downloading TIGER Data:     - Available via the US Census Bureau website.     - Address range data is typically found in "All Lines" county-level shapefiles (not state or national levels\text{not state or national levels}).     - "All Lines" files include roads, railroads (coded "R"), and hydrology/streams (coded "H").     - Definition Query: Users should use a where clause to restrict features to "S" (Streets) to isolate road networks.

  • Attribute Fields in TIGER Street Centerlines:     - Left from address (LFROMADD\text{LFROMADD}).     - Left to address (LTOADD\text{LTOADD}).     - Right from address (RFROMADD\text{RFROMADD}).     - Right to address (RTOADD\text{RTOADD}).     - Left and Right Zip codes.

  • Directionality: Each segment has a starting point and an ending point. The geocoder uses the "from" and "to" values to determine which side of the street (left or right) and where along the segment a point feature should be placed.

Troubleshooting and Matching Issues

  • Unmatched Results Causes:     - Name Mismatches: Street names in tables differ from reference data (e.g., "William Joel Bryan" vs. "FM 158"). This is solved using an "alternate name table."     - Data Entry Errors: Typos, misspellings, or combining words.     - Place Name vs. Address: Searching for a name when reference data only has addresses. This is solved using an "alias table" (e.g., linking "Texas A&M University" to the address 400400 Bizzell St).     - PO Boxes: These lack physical geographic locations and will not geocode to a street segment.

  • Standardization Best Practices:     - Use standardized abbreviations for cardinal directions (NN, SS, EE, WW, and NWNW for Northwest).

ArcGIS World Geocoding Service and Pro Implementation

  • Esri Geocoding Service:     - Accessible via ArcGIS Online.     - Requires an ArcGIS license and uses "credits" (though one-year licenses usually include a set amount).     - Building a personal address locator is free of charge.

  • Walkthrough of ArcGIS Pro Geocoding (Address and Place Layer tool):     - Step 1: Select Locator (e.g., ArcGIS World Geocoding Service).     - Step 2: Input Address Table and define structure (one field or more than one field).     - Step 3: Map Fields (the tool attempts to auto-map columns like "Street," "City," "State," and "Zip").     - Step 4: Define Output (specify the path for the new point feature class).     - Step 5: Select Country (e.g., United States) to increase accuracy.     - Step 6: Select Category (e.g., Address).

  • Final Review:     - The tool provides an estimated credit consumption (e.g., 0.280.28 credits for a small batch).     - Outcome results show: "Matched" (successful), "Unmatched" (failure), and "Tie" (multiple potential locations).     - The final result is a point feature class suitable for visualization and overlaying with other GIS layers.