Comprehensive Guide to Geocoding in ArcGIS
Fundamentals of Geocoding in ArcGIS
Geocoding is the specific process of transforming a description of a location into a precise location on the earth's surface.
A description of a location can take several forms: - A pair of coordinates (e.g., latitude and longitude). - A specific street address. - The name of a place (e.g., a park or a business).
Users may already be familiar with transforming tabular coordinate data into a feature class using the "add x y data" functionality in ArcMap.
While place names, such as "Bastrop State Park," can be geocoded, the primary focus in professional GIS applications is the transformation of tabular address data into GIS features.
Geocoding can be performed individually (entering one location description at a time) or in batch mode (processing many locations at once from a table).
The resulting output of geocoding exists as geographic features with associated attributes suitable for mapping and spatial analysis.
The process creates features using either: - Geographic coordinates (). - Map projections using .
The Logic and Utility of Geocoding
The geocoding process mimics how humans find real-world locations based on descriptions: - Example: To find Olsen Blvd, College Station, TX, one would first locate the state of Texas. - Next, find the city of College Station. - If zip codes are available, identify the specific zip code area. - Finally, identify the street and estimate the position of the address based on the house number.
Geocoding serves as a tool to narrow down geographic options within a map or the physical world.
Applications of Geocoded Data: - Spatial Pattern Recognition: By converting addresses into points, patterns within information become visible through visual inspection or ArcGIS analysis tools. - Crime Mapping: Police reports often collect addresses rather than GPS coordinates. Geocoding these reports helps visualize crime hotspots and place individual incidents in a broader spatial context. - Customer Data Management: Organizations maintain tables with customer names, addresses, and buying habits. Geocoding this data allows for: - Establishing marketing strategies. - Targeting specific customer clusters. - Producing route maps and directions.
Core Components of a Geocoder
Address Locator Style (Address Locator Template): - This serves as the "skeleton" of the address locator. - It determines the type of address that can be geocoded, defines field mapping for reference data, and dictates what output information is returned when a match is found. - Example: The "US address dual ranges" style is used for street centerlines containing left and right address ranges.
Address Locator: - This is the primary tool for geocoding within ArcGIS. - It contains all geocoding parameters, properties, and a snapshot of address attributes from the reference data. - It functions like a "street guide" or "map book" that directs a user to a specific page and pinpoints a location.
Reference Data: - This is the GIS feature class source used to build the address locator. - It must contain attributes such as house number ranges, street names, and street types. - The match rate of geocoding depends heavily on the completeness, spatial accuracy, and attribute accuracy of this data.
Address Elements and Parsing Mechanisms
The geocoding engine breaks an address into subunits called "address elements."
Common Address Elements: - House number. - Prefix direction/Prefix type. - Street name. - Street type. - Suffix direction. - City, State, and Zip code.
Parsing Rules: The address locator uses defined rules to break down strings. This can lead to multiple interpretations (e.g., the word "Park" could be a street name or a street type).
Scoring: The locator searches all element combinations, finds possible candidates, and assigns each a score based on how well it matches. The best matches are presented based on these scores.
Data Consistency Requirements: - Addresses can be a single string or split across multiple fields. - ArcGIS performs better when street address, city, state, and zip code are in separate fields. - Database compatible formats include: - CSV: Comma Separated Values (text files with fields separated by commas). - DBF: Database File (an older standard from the dBase software package). - Excel: Spreadsheet formats.
ArcGIS Pro Tools: The "split address into components" tool in the geocoding tools toolbox can be used to parse street address information into unique fields.
The Geocoding Workflow
Step 1: Obtain and modify reference data to match locator style requirements.
Step 2: Select an address locator style appropriate for the data and available reference attributes.
Step 3: Create the address locator and specify geocoding options.
Step 4: Use the locator to geocode the address table.
Reference Data Considerations: - Extent: The reference data must cover the entire area of interest (e.g., Texas-wide data is needed if addresses are state-wide; Brazos County data will not suffice). - Resolution: The data must be detailed enough to find the specific target (e.g., it must have address ranges, not just street names).
US Census Bureau and TIGER Files
TIGER Files: Stands for Topologically Integrated Geographic Encoding and Referencing.
These files provide topological encoding, which allows for address finding and road network creation (routing from point A to point B).
Content of TIGER Files: Roads (street centerlines), railroads, zip codes, census blocks, legal/statistical areas, and demographic/population data.
Downloading TIGER Data: - Available via the US Census Bureau website. - Address range data is typically found in "All Lines" county-level shapefiles (). - "All Lines" files include roads, railroads (coded "R"), and hydrology/streams (coded "H"). - Definition Query: Users should use a where clause to restrict features to "S" (Streets) to isolate road networks.
Attribute Fields in TIGER Street Centerlines: - Left from address (). - Left to address (). - Right from address (). - Right to address (). - Left and Right Zip codes.
Directionality: Each segment has a starting point and an ending point. The geocoder uses the "from" and "to" values to determine which side of the street (left or right) and where along the segment a point feature should be placed.
Troubleshooting and Matching Issues
Unmatched Results Causes: - Name Mismatches: Street names in tables differ from reference data (e.g., "William Joel Bryan" vs. "FM 158"). This is solved using an "alternate name table." - Data Entry Errors: Typos, misspellings, or combining words. - Place Name vs. Address: Searching for a name when reference data only has addresses. This is solved using an "alias table" (e.g., linking "Texas A&M University" to the address Bizzell St). - PO Boxes: These lack physical geographic locations and will not geocode to a street segment.
Standardization Best Practices: - Use standardized abbreviations for cardinal directions (, , , , and for Northwest).
ArcGIS World Geocoding Service and Pro Implementation
Esri Geocoding Service: - Accessible via ArcGIS Online. - Requires an ArcGIS license and uses "credits" (though one-year licenses usually include a set amount). - Building a personal address locator is free of charge.
Walkthrough of ArcGIS Pro Geocoding (Address and Place Layer tool): - Step 1: Select Locator (e.g., ArcGIS World Geocoding Service). - Step 2: Input Address Table and define structure (one field or more than one field). - Step 3: Map Fields (the tool attempts to auto-map columns like "Street," "City," "State," and "Zip"). - Step 4: Define Output (specify the path for the new point feature class). - Step 5: Select Country (e.g., United States) to increase accuracy. - Step 6: Select Category (e.g., Address).
Final Review: - The tool provides an estimated credit consumption (e.g., credits for a small batch). - Outcome results show: "Matched" (successful), "Unmatched" (failure), and "Tie" (multiple potential locations). - The final result is a point feature class suitable for visualization and overlaying with other GIS layers.