Week 6.2
GEOGRAPHY 280: THINKING SPATIALLY IN A DIGITAL WORLD
Course Code: GEOG 280 L01 - (Fall 2025)
University of Calgary
BASIC DATA QUALITY CHECKING (QC)
Introduction to open data and project introductions (OE3).
RECAP OF TOPICS COVERED
Summary of key processes:
Travel areas
Finding nearest locations
Creation of centroids
Discussion on ESRI credit management and data filtering.
LATITUDE/LONGITUDE PRECISION
Importance of Coordinate Precision
Coordinates can indicate various spatial details:
28°N, 80°W indicates a general area.
28.5°N, 80.6°W specifies a city.
28.52°N, 80.68°W points to a neighborhood.
28.523°N, 80.683°W refers to a suburban cul-de-sac.
28.5234°N, 80.6830°W indicates a particular corner of a house.
28.5234571°N, 80.6830941°W implies targeting an exact room location or specific point.
Excessive Precision Consequences
Going beyond practical precision can lead to pointlessness:
E.g., coordinates indicating individual atoms are impractical and unnecessary.
Reference: XKCD comic regarding extreme level of precision.
DATA QUALITY CHECKING
Analysis Considerations
Requirements for effective analysis:
Features in specific shapes (point, line, polygon).
No blanks or zeros.
Minimum record limits for processing.
Error-free topology, which requires nearby roads for route measurements.
CENTROID ANALYSIS
Input Features
Feature selection for centroids:
Users must specify if centroids should fall within features or at true geometric centroids.
Example Data
Input layer:
Census districts - Count of features: 306
Output requirements:
Result layer naming and saving protocols.
IMPORTING CSV FILES INTO ARCGIS ONLINE
CSV: Comma-Separated Values format.
Opening in a text file shows commas separating each attribute;
In Excel, the data presents as a table.
Requirements for importing:
Inclusion of latitude and longitude columns.
Attribute headers must have no spaces or special characters except underscores.
Importance of cleaning data before upload:
Avoid extra spaces and ensure concise names.
TIPS IF ANALYSIS DOES NOT WORK
Steps for Troubleshooting:
Verify input layers.
Ensure data model conformity (vector vs raster).
Confirm correct shapes (point, line, polygon).
Check for blanks/zeroes.
Review documentation for analysis limitations (e.g., minimum/maximum data points).
Experiment with different parameters.
Attempt re-adding layers.
Log out and back into ArcGIS Online and consult online resources.
VIEWING ATTRIBUTE DATA
Steps for Evaluating Data:
Read the metadata for understanding column meanings and data collection.
Check data units.
Sort columns to identify outliers or anomalies.
Identify any blanks or null values.
UNDERSTANDING NULLS, ZEROS, AND PLACEHOLDER NUMBERS
Definitions:
Null Values: Indicate missing data, not zeroes. For instance, in temperature data, a measurement might be entered as null if missed.
Zero Values: Represent the absence of something measurable, e.g., zero animal crossings means none occurred.
Placeholder Numbers: Used when software requires numeric input; often meant to indicate missing values (like -9999).
Potential Analysis Errors:
Nulls and zeros can lead to different analysis outcomes:
May skip records or cause analyses to not run.
Can be misrepresented if not clearly understood.
IMPORTANCE OF ATTRIBUTE SELECTION
Quality datasets are crucial. Typos or format differences complicate analysis.
Example variations in names stored in databases:
Different forms of the name “Timothy Hunt” leads to complications in filtering when compared to standardized forms.
MEASUREMENT TOOL USAGE
Utilize the Measurement tool on maps to confirm distance/area-related analyses.
Ask if result validity makes intuitive sense.
MIXING AND AGGREGATING DATASETS
Be cautious when merging datasets from diverse sources:
Cohesion is better maintained in a singular governmental dataset than when combining multiple provincial datasets.
Differing standards and collection methods of multiple contributors can affect final results.
SYLLABUS AND COURSE OUTCOMES
Course Learning Outcomes:
Understanding how maps represent spaces.
Describing knowledge production through geospatial technologies.
Recognizing applications of geospatial technologies.
Applying industry-standard software for mapping tasks.
Generating an applied mapping project.
YOUR GEOSPATIAL PROJECT
OE/ICE 3 Project Guidelines:
Initial proposal using Open Calgary and Living Atlas data.
Components include:
Title and research question.
Dataset table with source information & relevance.
Importance paragraph for research question.
Analysis approach paragraph for geospatial data.
Linkages to Tutorials 4 and 5 facilitate progressive learning.
OUTCOMES FROM THE PROJECT:
Skills gained:
Writing research proposals
Working with open data
Data quality checking
Project management
Geospatial analysis practice.
Resulting in a public-facing Storymap to showcase work.
FINDING PROJECT DATA
Required steps:
Locate data, evaluate data quality, and analyze data.
Repetition improves skill in navigating data challenges.
OPEN DATA EXPLANATION
Definition:
Open data can be used and modified freely without restrictions. Typically sourced from government entities, public institutions, and select non-profits.
Example search strategy: “Open data Calgary.”
METADATA IMPORTANCE
Metadata: Data regarding the data structure, source, update frequency, and relationships to other datasets, enhancing understanding.
Example usage: Government of Canada's metadata portals for clarifying dataset particulars.
GOVERNMENT OPEN DATA SOURCES
Resources:
Government of Canada - Open Government
Government of Alberta - Open Government Program
City of Calgary - Open Data initiatives.
ADDITIONAL DATA SOURCES
Open Street Map
Community-created map data that can be freely used when credited.
Methodologies involve local knowledge and GPS technology.
DATA ACCESS CHALLENGES
Potential for large data volumes that may vary in collection quality and completeness.
Consideration needed for the variance between governmental and stakeholder-reported datasets.
DATA IN THE PUBLIC DOMAIN
Comprises information available online which can vary in quality. Access can occur via transcription, downloading, or web scraping.