York Region Open Data Session 2: Advanced Data Interactivity and Technical Applications

Recap of Regional Governance and Data Infrastructure

  • Upper Tier Municipality: York Region functions as an upper-tier municipality, which implies a specific level of responsibility for shared services and infrastructure across its local municipalities.

  • Regional Services and Taxation: Government services are extensive, and data is used to determine where tax dollars are allocated and how programs across transit, water quality, and childcare are administered.

  • History of GIS at York Region: Geographic Information Systems (GIS) and spatial data have a long history within the region. Modeling "geographic phenomena"—such as roads, buildings, and human populations—is essential for running government programs.

  • The York Info Partnership: This collaboration is vital for sharing data beyond the region's internal departments. It establishes a framework for data value exchange between the region, local municipalities, and educational institutions.

Defining Open Data: Standards and Characteristics

  • Core Definition: Open data must be free, accessible, and subject to very few restrictions. Use of the data is governed by the Open Data License.

  • Machine-Readable Formats: Unlike static formats like Word documents or PDFs, open data must be in a format that computers can process, such as tabular (Excel/CSV) or spatial (Feature Layers) formats. This allows users to "plug" the data into Excel or a GIS system.

  • Contrast with Proprietary Data: Proprietary data is licensed and often involves significant costs, whereas open data is available to the general public without financial barriers.

  • Metadata (Data About the Data): Metadata describes the dataset's origin, creation purpose, and content details. The speaker used the metaphor of a grocery store aisle where cereal boxes have no labels; without metadata, a user cannot distinguish one dataset (e.g., Cheerios) from another.

Navigation of the Open Data Portal and Catalog Search

  • The Data Catalog: A searchable engine where users can filter by keywords and content types.

  • Filtering by Feature Layer: This is a critical tip for students wishing to create visualizations. At the time of the recording:     * Total results in the catalog: 276276     * Results after filtering for "Feature Layer": 7070

  • Visualization Capabilities: Feature layers are the only datasets that allow for direct online visualization and integration into platforms like ArcGIS and Story Maps.

  • Raw Data vs. Visualizations: Many datasets provide both the raw underlying data and "Data Stories" or dashboards that summarize the information.

Case Studies in Specific Datasets

  • Active Development Application Boundaries:     * Content: Location, size, type, status, name, and description of current land developments.     * Categories: Includes housing, commercial, and institutional developments.     * Geographic Concentration: The highest number of active applications are found in Vaughan, followed by Markham and Richmond Hill.     * Contextual Links: The metadata includes links to york.ca pages explaining the land development process for users needing deeper context.

  • York Region Business Directory (2019):     * Format: Published as an Excel file, not as a map-centric feature layer.     * Scale: A very large dataset containing approximately 33,00033,000 records.     * Fields: Detailed descriptions of business types, ranging from manufacturing and retail to tattoo parlors and nail salons.

  • Building Footprints:     * Content: The area of all buildings in York Region, down to individual houses and garages.     * Scale: Contains 346,059346,059 records.     * Application: Used by the federal government to create a national building dataset, using satellite imagery to identify gaps in local digital captures.

  • Waste Diversion Statistics:     * Content: Total tonnage by year and type (Garbage, Blue Box, Yard Waste).     * Limitation: This specific dataset lacks geographic markers (like municipality names), making it unsuitable for mapping in its current table format.

Advanced Products: Data Stories, Insights, and York Maps

  • Data Stories: Interactive story maps that narrate data findings.     * Example: "Lake to Lake" Story Map—a collaboration between York Region and the City of Toronto for a proposed bike path from Virginia Beach at Lake Simcoe (Georgina) to Lake Ontario.     * Current count of available Story Maps: 1414

  • Data Insights (Tableau Public): The region's platform for dashboards and summaries.     * Drinking Water Dashboard: Simplifies complex raw data regarding chemicals (parts per million) like Chlorine, Fluoride, Sodium, and Lead.     * Census Profile Dashboards: Focus on immigration, unemployment, and labor force statistics.

  • York Maps: Highly polished products described as "Google Maps for government open data."     * Early Years Community Profile: Uses the Early Developmental Index (EDI) to show how children (ages 00 to 66) score in emotional maturity and literacy according to kindergarten teacher surveys.

Questions and Discussion

  • Integrating Excel Data into Maps:     * Question: Is there a way to put Excel data onto a map?     * Response: Yes, through geocoding addresses or postal codes into latitude/longitude dots. However, address data can often be "dirty" (e.g., apostrophes in names like "Apostrophe's"), which can disrupt geocoding services.

  • Public Data Literacy and Outreach:     * Question: Is York Region teaching the public (specifically older generations) about open data?     * Response: There is no direct outreach, but the COVID-19 pandemic significantly increased data literacy. Citizens followed COVID data on Facebook, summarized it for others, and compared York Region's statistics with Toronto's in the comments.

  • AI and Open Data:     * Question: How does open data boost AI research?     * Response: The Building Footprints dataset is a prime example. The federal government uses AI image detection on satellite imagery to find gaps between real-world buildings and the region's digital records, improving data quality.

  • Technical Troubleshooting (Excel vs. Google Sheets):     * Scenario: A student (Connie) was unable to open census data from 20062006 because it defaulted to WordPad.     * Solution: Since Excel is proprietary software, students without it can download the .ZIP file, extract the .XLS file, and import it into Google Sheets. XML files within these packages contain field-level metadata but are primarily meant to be machine-readable by computers.

  • The Data Publication Process:     * Question: How does the region choose which data to make open?     * Response: It is a rigorous process involving legal and privacy teams. High-priority data often meets a specific public need (e.g., childcare subsidies, COVID stats) or reduces frequent public inquiries.     * Exclusions: Certain data is withheld for security reasons, such as the exact locations of underground water and wastewater pipes (pumps, fittings, plants).