York Region Open Data Session 2: Advanced Data Interactivity and Technical Applications
Recap of Regional Governance and Data Infrastructure
Upper Tier Municipality: York Region functions as an upper-tier municipality, which implies a specific level of responsibility for shared services and infrastructure across its local municipalities.
Regional Services and Taxation: Government services are extensive, and data is used to determine where tax dollars are allocated and how programs across transit, water quality, and childcare are administered.
History of GIS at York Region: Geographic Information Systems (GIS) and spatial data have a long history within the region. Modeling "geographic phenomena"—such as roads, buildings, and human populations—is essential for running government programs.
The York Info Partnership: This collaboration is vital for sharing data beyond the region's internal departments. It establishes a framework for data value exchange between the region, local municipalities, and educational institutions.
Defining Open Data: Standards and Characteristics
Core Definition: Open data must be free, accessible, and subject to very few restrictions. Use of the data is governed by the Open Data License.
Machine-Readable Formats: Unlike static formats like Word documents or PDFs, open data must be in a format that computers can process, such as tabular (Excel/CSV) or spatial (Feature Layers) formats. This allows users to "plug" the data into Excel or a GIS system.
Contrast with Proprietary Data: Proprietary data is licensed and often involves significant costs, whereas open data is available to the general public without financial barriers.
Metadata (Data About the Data): Metadata describes the dataset's origin, creation purpose, and content details. The speaker used the metaphor of a grocery store aisle where cereal boxes have no labels; without metadata, a user cannot distinguish one dataset (e.g., Cheerios) from another.
Navigation of the Open Data Portal and Catalog Search
The Data Catalog: A searchable engine where users can filter by keywords and content types.
Filtering by Feature Layer: This is a critical tip for students wishing to create visualizations. At the time of the recording: * Total results in the catalog: * Results after filtering for "Feature Layer":
Visualization Capabilities: Feature layers are the only datasets that allow for direct online visualization and integration into platforms like ArcGIS and Story Maps.
Raw Data vs. Visualizations: Many datasets provide both the raw underlying data and "Data Stories" or dashboards that summarize the information.
Case Studies in Specific Datasets
Active Development Application Boundaries: * Content: Location, size, type, status, name, and description of current land developments. * Categories: Includes housing, commercial, and institutional developments. * Geographic Concentration: The highest number of active applications are found in Vaughan, followed by Markham and Richmond Hill. * Contextual Links: The metadata includes links to
york.capages explaining the land development process for users needing deeper context.York Region Business Directory (2019): * Format: Published as an Excel file, not as a map-centric feature layer. * Scale: A very large dataset containing approximately records. * Fields: Detailed descriptions of business types, ranging from manufacturing and retail to tattoo parlors and nail salons.
Building Footprints: * Content: The area of all buildings in York Region, down to individual houses and garages. * Scale: Contains records. * Application: Used by the federal government to create a national building dataset, using satellite imagery to identify gaps in local digital captures.
Waste Diversion Statistics: * Content: Total tonnage by year and type (Garbage, Blue Box, Yard Waste). * Limitation: This specific dataset lacks geographic markers (like municipality names), making it unsuitable for mapping in its current table format.
Advanced Products: Data Stories, Insights, and York Maps
Data Stories: Interactive story maps that narrate data findings. * Example: "Lake to Lake" Story Map—a collaboration between York Region and the City of Toronto for a proposed bike path from Virginia Beach at Lake Simcoe (Georgina) to Lake Ontario. * Current count of available Story Maps:
Data Insights (Tableau Public): The region's platform for dashboards and summaries. * Drinking Water Dashboard: Simplifies complex raw data regarding chemicals (parts per million) like Chlorine, Fluoride, Sodium, and Lead. * Census Profile Dashboards: Focus on immigration, unemployment, and labor force statistics.
York Maps: Highly polished products described as "Google Maps for government open data." * Early Years Community Profile: Uses the Early Developmental Index (EDI) to show how children (ages to ) score in emotional maturity and literacy according to kindergarten teacher surveys.
Questions and Discussion
Integrating Excel Data into Maps: * Question: Is there a way to put Excel data onto a map? * Response: Yes, through geocoding addresses or postal codes into latitude/longitude dots. However, address data can often be "dirty" (e.g., apostrophes in names like "Apostrophe's"), which can disrupt geocoding services.
Public Data Literacy and Outreach: * Question: Is York Region teaching the public (specifically older generations) about open data? * Response: There is no direct outreach, but the COVID-19 pandemic significantly increased data literacy. Citizens followed COVID data on Facebook, summarized it for others, and compared York Region's statistics with Toronto's in the comments.
AI and Open Data: * Question: How does open data boost AI research? * Response: The Building Footprints dataset is a prime example. The federal government uses AI image detection on satellite imagery to find gaps between real-world buildings and the region's digital records, improving data quality.
Technical Troubleshooting (Excel vs. Google Sheets): * Scenario: A student (Connie) was unable to open census data from because it defaulted to WordPad. * Solution: Since Excel is proprietary software, students without it can download the .ZIP file, extract the .XLS file, and import it into Google Sheets. XML files within these packages contain field-level metadata but are primarily meant to be machine-readable by computers.
The Data Publication Process: * Question: How does the region choose which data to make open? * Response: It is a rigorous process involving legal and privacy teams. High-priority data often meets a specific public need (e.g., childcare subsidies, COVID stats) or reduces frequent public inquiries. * Exclusions: Certain data is withheld for security reasons, such as the exact locations of underground water and wastewater pipes (pumps, fittings, plants).