AA

Module 9 - Social/Temporal SQL and Organizational Intelligence

Page 1: Introduction

  • Course Title: MIST 7510E - Database Management Social/Temporal SQL and Organizational Intelligence

  • Instructor: Nikhil Srinivasan

  • Affiliation: The University of Georgia TERRY COLLEGE OF BUSINESS

Page 2: MySQL Spatial Extensions

  • Specifications: Follow Open Geospatial Consortium standards

  • Implementation: Subset of proposed extensions

Page 3: Spatial Data Management

  • Overview: Management of spatially-referenced data via Geographic Information Systems (GIS)

  • Key Concepts:

    • Theme: Spatial representation of an entity (e.g., river, road)

    • Map: A representation of themes on paper or screen

    • Geographic Object: Instance of a theme

Page 4: Generic Spatial Data Types

  • Data Types:

    • Point: 0 dimensions (e.g., scenic lookout)

    • Line: 1 dimension (e.g., river)

    • Region: 2 dimensions (e.g., county)

Page 5: Data Model for Political Units

  • Structure:

    • POLITICAL BOUNDARY UNIT:

      • boundid

      • unitname

      • boundpath

      • unitcode

      • unitpopulation

    • CITY:

      • cityname

      • citylocation

Page 6: SQL/MM Spatial

  • Definition: ISO standard for managing spatial data in RDBMS

  • Naming Conventions: Prefix ST_ for tables, views, and functions

  • Original Meaning: Spatial and Temporal (now separated)

Page 7: MySQL Geometric Data Types

  • Types and Descriptions:

    • Point: POINT(x y) - represents a point in space (e.g., a city)

    • LineString: LINESTRING(x1 y1,x2 y2,…) - linear interpolation between points (e.g., a road)

    • Polygon: POLYGON((x1 y1,x2 y2,…)) - represents a single exterior boundary (e.g., boundaries)

Page 8: Creating Tables

  • SQL Commands:

    • CREATE TABLE political_unit (...);

    • CREATE TABLE boundary (...);

    • CREATE TABLE city (...);

Page 9: Mapping Representation

  • Distance Scale: 0-100 Miles / 0-150 Kilometers visual representation

  • Cities in Northern Ireland: Londonderry, Belfast, Sigo, Galway, Dublin, Cork

Page 10: Inserting Data into Database

  • SQL INSERT Commands:

    • INSERT INTO political_unit VALUES ('Republic of Ireland','ie', 3.9);

    • INSERT INTO boundary VALUES (1,ST_GeomFromText('polygon(...)'),'ie');

    • Other cities being inserted with their respective geographic data

Page 11: Reviewing Boundary Data

  • SQL Command:

    • Retrieve the boundary:SELECT boundpath FROM boundary where boundid = 2;

    • Result Example: POLYGON ((7 11,9 11,...))

Page 12: MySQL Geometry Functions

  • Key Functions:

    • ST_X(Point): x-coordinate of a point

    • ST_Y(Point): y-coordinate of a point

    • ST_Length(LineString): length of a linestring

    • ST_Area(Polygon): area of a polygon

    • ST_Distance(Point, Point): distance between two points

Page 13: Area Calculation

  • SQL Query for Area:

    • SELECT ST_Area(boundpath)*1406 as "Area (km^2)"...

    • Calculated Area: 71706 km²

Page 14: Distance Calculation

  • SQL Query Example:

    • SELECT ST_Distance(orig.cityloc,dest.cityloc)*37.5 AS "Distance (kms)"...

    • Result: Distance between Sligo and Dublin is 167.71 km

Page 15: Finding Closest City

  • SQL Query:

    • Retrieve the closest city to Limerick with subquery for minimum distance

    • Result: Tipperary

Page 16: Identifying Westernmost City

  • SQL Query:

    • Determine the westernmost city using conditional querying

    • Results: Limerick, Galway

Page 17: Exercise

  • Question: What is the easternmost city in Northern Ireland?

Page 18: Geometry Collections

  • Definition: Data type for one or more geometries

  • Types:

    • MULTIPOINT: collection of points

    • MULTILINESTRING: collection of line strings

    • MULTIPOLYGON: collection of polygons

    • GEOMETRYCOLLECTION: collection of geometries

Page 19: MultiPoint Type

  • Definition: Collection of points

  • Example: MULTIPOINT(9.0 6.1, 8.9 6.0)

Page 20: MultiLineString Type

  • Definition: Collection of line strings

  • Example:

    • MULTILINESTRING((9 6, 4 6), (9 6, 5 2))

Page 21: MultiPolygon Type

  • Definition: Collection of polygons

  • Example:

    • MULTIPOLYGON(((0 0,10 0,...)))

Page 22: GeometryCollection Type

  • Definition: A collection including various geometries

  • Example:

    • GEOMETRYCOLLECTION(LINESTRING(15 15, 20 20), POINT(10 10),...)

Page 23: Inserting Data into Geometry Collections

  • Example SQL Command:

    • INSERT INTO table VALUES ST_GeomCollFromText(...)

Page 24: Temporal Data

  • Description: Data associated with a time frame, indicating when valid and stored

  • Importance: Vital for tracking changes over time in databases

Page 25: Types of Time

  • Transaction Time: When data entered

  • Valid Time: When the data value is true

Page 26: Temporal Data Types

  • Types:

    • Instant

    • Anchored Interval

    • Unanchored

Page 27: Modeling Temporal Data

  • Structure:

    • SHARE:

      • Attributes: shrcode, shrname

    • TRANSACTION:

      • Attributes: shrprice, shrqty, shrdiv

Page 28: TSQL

  • Purpose: Enhances SQL with features for data definition, constraints, and manipulation

  • Temporal Structured Query Language: Designed specifically for temporal data

Page 29: Conclusions on Data Management

  • Trend: Increasing need for spatial data maintenance due to location-based services

  • Future Needs: Common management of temporal data for comprehensive historical records

  • New Functions: Requirement for managing new data types effectively

Page 30: Organizational Intelligence Technologies

  • Quote by Machiavelli: Types of intelligence in organizations; superior understanding is the most valuable

Page 31: Organizational Intelligence Definition

  • Outcome: Collection, storage, processing, and interpretation of internal and external data

  • Importance: Central for effective decision-making and information distribution

Page 32: Information Systems Cycle

  • Loops:

    • Remembering the past (databases, data warehouses)

    • Present Handling (BI, data mining, DSS)

    • Preparing for the future (new business systems)

Page 33: The Problem of Fragmented Memory

  • Issues:

    • Organizational memory is fragmented: different systems, technologies, and locations

    • Underused systems lead to missed insights

Page 34: The Data Warehouse

  • Definition: A centralized repository for organizational data

  • Scale: Can contain petabytes of data

  • Components: Management tools, analysis tools, transformation engines

Page 35: Managing the Data Warehouse

  • Processes:

    • Extraction, Transformation, Cleaning, Loading, Scheduling, Metadata management

Page 36: Data Extraction Challenges

  • Complexity: Difficulties with pulling data from existing systems; typically time-consuming and ongoing

  • Operational Systems: May not facilitate straightforward data extraction

Page 37: Data Transformation

  • Common Transformations:

    • Encoding: AB to A/B

    • Units: inches to centimeters

    • Field Adjustments: Date formatting changes

Page 38: Data Cleaning

  • Challenges:

    • Duplicate records across departments

    • Inaccurate data entry leading to multiple entries

    • Misuse of data fields

Page 39: Scheduling Data Updates

  • Trade-offs:

    • Too frequent = costly

    • Infrequent = outdated data

Page 40: Metadata Definition

  • Functions: Data dictionary with additional facts about stored data

    • Description, format, coding standards, source info, frequency of extracts

Page 41: Warehouse Architectures

  • Types:

    • Centralized

    • Federated

    • Tiered

Page 42: Centralized Data Warehouse Structure

  • Components: Corporate data warehouse serving various analysts across multiple departments

Page 43: Federated Data Warehouse

  • Structure: Collaborative environment across various departments with shared corporate data warehouse

Page 44: Tiered Data Warehouse

  • Levels:

    • Tier 3: Detailed data

    • Tier 2: Summarized data

    • Tier 1: Highly summarized data

Page 45: Hardware/Software Choices

  • Default Technologies:

    • Hadoop for file management

    • MapReduce for processing

    • Commodity nodes for operations

Page 46: Data Store Utilization

  • Functions: Verification and discovery of patterns through data mining and OLAP technologies

Page 47: Data Verification and Discovery

  • Questions:

    • Average sales metrics, college graduation predictors, and sales insights based on varying factors

Page 48: OLAP Overview

  • Purpose: Complement RDBMS with multidimensional data views for synthesis and analysis

Page 49: TPS vs. OLAP Comparison

  • Key Differences:

    • TPS is optimized for transactions; OLAP for data analysis

    • Real-time vs. batch updates

Page 50: ROLAP Explanation

  • Concept: Multidimensional model imposed on a relational database structure

  • Efficiency: Less optimal compared to dedicated OLAP systems

Page 51: Star Schema Structure

  • Diagram Structure: Central fact table connected to dimensional tables with single joins

Page 52: Snowflake Schema Structure

  • Extension of Star Schema: Handles larger dimensional tables requiring multiple joins for data retrieval

Page 53: Rotation and Sales Data

  • Example Data: Summary metrics for sales across regions and years

Page 54: Sales Variance Analysis

  • Drill Down Analysis: Variance in sales by region and specific nations

Page 55: Hypercube Visualization

  • Concept: Three-dimensional representation of sales data metrics

Page 56: Three-Dimensional Hypercube Display

  • Example Data: Columns of sales data arranged by region and year

Page 57: Six-Dimensional Hypercube Structure

  • Example Dimensions: Brand, store, customer segment, product group, time frame, units sold

Page 58: Six-Dimensional Hypercube Display

  • Sales and Revenue Colors: Differentiated types of data visualized over different dimensions

Page 59: Link Between RDBMS and MDDB

  • Interactions: Data extraction/aggregation from RDBMS leads to drill-downs in MDDBs

Page 60: MDDB Design Concepts

  • Identifying Dimensions: Define what is tracked; utilize identifiers for tracking sales data

Page 61: Prompts for Identifying Dimensions

  • Examples: Qualified identifiers for time, location, product, and sales outcomes

Page 62: Variables and Identifiers

  • Example Data: Dimensions for tracking sales over specified time intervals

Page 63: Exercise on Multidimensional Database Design

  • Task: Identify dimensions for a hotel chain's marketing database design

Page 64: Analysis and Variable Types

  • Variables: Discussion of types of dimensions and statistical analysis types applicable to sales data

Page 65: Multidimensional Expressions (MDX)

  • Purpose: Language for data reporting in multidimensional databases, similar to SQL

Page 66: Pentaho Overview

  • Description: Open source Business Intelligence project bridging various BI tools

Page 67: Data Mining Applications

  • Use Cases: Analysis for predictions in marketing, loans, and scientific research

Page 68: Data Mining Functions

  • Types of Functions: Associations, patterns, classifications, cluster analyses, predictive analytics

Page 69: Data Mining Technologies

  • Techniques: Decision trees, genetic algorithms, neural networks, data visualization approaches

Page 70: SQL-99 and OLAP Interaction

  • Complexity in Queries: Multiple queries may be necessary for comprehensive revenue reporting

Page 71: SQL-99 Extensions

  • GROUP BY Enhancements: Features like GROUPING SETS, ROLLUP, and CUBE

Page 72: GROUPING SETS Example

  • SQL Example: Aggregating revenue figures across various channels and locations

Page 73: GROUPING SETS Results

  • Summary: Revenue figures by channel and location

Page 74: ROLLUP Functionality

  • SQL Example: Aggregate results summarized through hierarchical levels

Page 75: ROLLUP Results

  • Results Summary: Breakdown of revenues across multiple channels

Page 76: CUBE Functionality

  • SQL Example: Comprehensive grouping for revenue analysis by multiple dimensions

Page 77: CUBE Results Overview

  • Results Summary: Total revenue across dimensions in a complete listing

Page 78: MySQL ROLLUP Implementation

  • SQL Queries: Examples of applying formatting to aggregate results with rollup

Page 79: Exercises Using ClassicModels

  • Tasks: Revenue computations using ROLLUP variations

Page 80: SQL OLAP Extensions Significance

  • Comparison: Less powerful than specialized MDDB tools

Page 81: Concluding Remarks on Data Management

  • Evolution of Discipline: Responsibility of data managers now includes supporting organizational intelligence technologies.