1/20
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Types of data
Open data
Commercial Data
Open data
Geospatial data that are freely available to the public
Government data
Collected to deliver and plan public services, to inform policy around planning and conservation
E.g., transport networks, land use, environmental monitoring, census demographics
Volunteered geographic information
Collected by individuals and communities rather than institutions to record local knowledge
E.g., OpenStreetMap (OSM) → collaborative mapping project including features of varying scale
Benefits & Limitations of Open Data
Benefits:
Data accessibility
Promotes transparency around decision-making
Limitations:
Variability in quality and spatial and temporal coverage
Privacy concerns (data not aggregated)
Commercial data
Geospatial data that are typically created and managed by private companies (e.g., Esri, or some datasets in Scholars geoportal)
Data are often sold under licensing agreements, limiting how and where data can be used
Benefits & Limitations of Commercial Data
Benefits:
High accuracy and quality data
Regularly updated and verified by specialists (technical support)
Limitations:
Costs limit accessibility
Licenses limit terms of use (e.g., modifying or sharing data)
Ways of accessing geospatial data
Application Programming Interface (API)
Data Portal
Web Map Services
Application Programming Interface (API)
Data are programmatically retrieved, often in real-time, without having to be downloaded (e.g., traffic data)
Data Portal
Data are accessible via a web interface and can be downloaded to a local machine (e.g., Scholar’s geoportal, City of Toronto’s Open Data Portal)
Web Map Services
Data are served over the internet and, although may be viewed in GIS software, are not stored on a local machine (e.g., Google Maps, ARCGIS Pro basemap)
Common vector file types
Shape file
GeoPackage
Shape files
Collection of files that work in unison
.shp stores geometry data
.shx indexes the data to speed up searching and rendering
.dbf stores attribute data in tabular format
Benefits:
Open standard format supported by most GIS software
Limitations
Collection of files can be difficult to share
Risk of corruption if one file is misplaced
Not suitable for large datasets (> 2GB)
GeoPackage
Stores multiple files in a single database file with a .gpkg extension
Benefits:
Open standard format supported by most GIS software
Can store both raster and vector data
Handles large-scale projects and datasets
Limitations
More complex structure and lower use can create a steeper learning curve
Common Raster File Types
GeoTIFF
ASCII
GeoTIFF
Standard TIFF image file with embedded georeferencing information
Benefits:
Data are compressed while preserving image quality
Multiple bands (layers) of raster data may be stored (e.g., different spectral bands from satellite imagery)
Limitations
Stored in binary format, meaning they can only be read by GIS software and not humans
Large file sizes and complex structures
ASCII
Plain text files containing each cell value, which can be opened with a text editor
Benefits:
Easily shared and easy to read by humans
Limitations
Inefficient data storage
Requires further data processing to georeference and visualize data
Enumeration units
Geographic units used to group (aggregate) data
Postal Codes
First 3 digits represent the Forward Sortation Area (FSA)
Last 3 digits represent the Local Delivery Unit (LDU)
Census Geographic Units
Collected for households every 5 years
Short form
Every household
Age, gender, marital status, mother tongue, relationships between household members
Long form
Random sample of 1 in 4 households
Short form questions + daily activities, education, income, home value
Nested Hierarchy
Dissemination Block (DB)
City block
Select data is released due to privacy concerns
Dissemination Area (DA)
Population of 4-700
Smallest unit of area with all information released
Census Tract (CT)
Population of 2500-80000
Neighbourhoods (only in metropolitan areas)
Census SubDivision (CSD)
Municipalities
Census Division (CD)
Counties
Limitations of Census Geographic Units
Census units & postal codes do not align
Information attributed to a postal code may not be correct
Consequences for marketing research (target audience not reached)
Ecological fallacy
Error that arises when an aggregate value for an area is assigned to an individual
Modifiable Aerial Unit Problem (MAUP)
Data from individual points yield different results when aggregated to spatial units of different shapes and sizes
Can cause differences in the analytical results of the same input data
Converting non-spatial data to spatial data
Plotting data based on XY coordinates
Requires XY coordinate pair data (e.g., longitude + latitude)
Data to be stored in separate columns (e.g., CSV, Excel spreadsheet, DBF files)
Coordinate reference system is required to map points to correct position
Spatial data import tools in GIS software create a point data file
Geocoding
Converts a non-spatial description of a location (e.g., address or place name) into point data
Reference data → contains location information and spatial representation of data
Event table → contains description of location, but no spatial reference information
Online geocoding services, such as ARCGIS Online, host reference data on a server
Quality of geocoding depends on completeness of event and reference data (e.g., issues when multiple addresses are similar)
Linking non-spatial data to spatial data (Non-spatial/Tabular Join)
Tabular data can be joined to point, line, or polygon data based on a common field
Shared attribute is typically a geographic identifier (e.g., census tract number, postal code number, postal code, or name)
Selection Query
Retrieves specific records from a table based on defined criteria
Can quantify, delete or update selected records
Metadata
Information about data
Needed to determine if a dataset is suited to a specific task
Where did it originate? How was it collected? By whom? At what time and scale? Using what coordinate reference system?
Helps make informed decisions about the data
Data dictionary → describes the data fields available, but other indicators allow for the quality of the data to be evaluated (e.g., units used)
As a data creator, builds transparency and enables data discovery (adding keywords)