Module 9 - Social/Temporal SQL and Organizational Intelligence
Page 1: Introduction
Course Title: MIST 7510E - Database Management Social/Temporal SQL and Organizational Intelligence
Instructor: Nikhil Srinivasan
Affiliation: The University of Georgia TERRY COLLEGE OF BUSINESS
Page 2: MySQL Spatial Extensions
Specifications: Follow Open Geospatial Consortium standards
Implementation: Subset of proposed extensions
Page 3: Spatial Data Management
Overview: Management of spatially-referenced data via Geographic Information Systems (GIS)
Key Concepts:
Theme: Spatial representation of an entity (e.g., river, road)
Map: A representation of themes on paper or screen
Geographic Object: Instance of a theme
Page 4: Generic Spatial Data Types
Data Types:
Point: 0 dimensions (e.g., scenic lookout)
Line: 1 dimension (e.g., river)
Region: 2 dimensions (e.g., county)
Page 5: Data Model for Political Units
Structure:
POLITICAL BOUNDARY UNIT:
boundid
unitname
boundpath
unitcode
unitpopulation
CITY:
cityname
citylocation
Page 6: SQL/MM Spatial
Definition: ISO standard for managing spatial data in RDBMS
Naming Conventions: Prefix ST_ for tables, views, and functions
Original Meaning: Spatial and Temporal (now separated)
Page 7: MySQL Geometric Data Types
Types and Descriptions:
Point:
POINT(x y)
- represents a point in space (e.g., a city)LineString:
LINESTRING(x1 y1,x2 y2,…)
- linear interpolation between points (e.g., a road)Polygon:
POLYGON((x1 y1,x2 y2,…))
- represents a single exterior boundary (e.g., boundaries)
Page 8: Creating Tables
SQL Commands:
CREATE TABLE political_unit (...);
CREATE TABLE boundary (...);
CREATE TABLE city (...);
Page 9: Mapping Representation
Distance Scale: 0-100 Miles / 0-150 Kilometers visual representation
Cities in Northern Ireland: Londonderry, Belfast, Sigo, Galway, Dublin, Cork
Page 10: Inserting Data into Database
SQL INSERT Commands:
INSERT INTO political_unit VALUES ('Republic of Ireland','ie', 3.9);
INSERT INTO boundary VALUES (1,ST_GeomFromText('polygon(...)'),'ie');
Other cities being inserted with their respective geographic data
Page 11: Reviewing Boundary Data
SQL Command:
Retrieve the boundary:
SELECT boundpath FROM boundary where boundid = 2;
Result Example:
POLYGON ((7 11,9 11,...))
Page 12: MySQL Geometry Functions
Key Functions:
ST_X(Point)
: x-coordinate of a pointST_Y(Point)
: y-coordinate of a pointST_Length(LineString)
: length of a linestringST_Area(Polygon)
: area of a polygonST_Distance(Point, Point)
: distance between two points
Page 13: Area Calculation
SQL Query for Area:
SELECT ST_Area(boundpath)*1406 as "Area (km^2)"...
Calculated Area: 71706 km²
Page 14: Distance Calculation
SQL Query Example:
SELECT ST_Distance(orig.cityloc,dest.cityloc)*37.5 AS "Distance (kms)"...
Result: Distance between Sligo and Dublin is 167.71 km
Page 15: Finding Closest City
SQL Query:
Retrieve the closest city to Limerick with subquery for minimum distance
Result: Tipperary
Page 16: Identifying Westernmost City
SQL Query:
Determine the westernmost city using conditional querying
Results: Limerick, Galway
Page 17: Exercise
Question: What is the easternmost city in Northern Ireland?
Page 18: Geometry Collections
Definition: Data type for one or more geometries
Types:
MULTIPOINT: collection of points
MULTILINESTRING: collection of line strings
MULTIPOLYGON: collection of polygons
GEOMETRYCOLLECTION: collection of geometries
Page 19: MultiPoint Type
Definition: Collection of points
Example:
MULTIPOINT(9.0 6.1, 8.9 6.0)
Page 20: MultiLineString Type
Definition: Collection of line strings
Example:
MULTILINESTRING((9 6, 4 6), (9 6, 5 2))
Page 21: MultiPolygon Type
Definition: Collection of polygons
Example:
MULTIPOLYGON(((0 0,10 0,...)))
Page 22: GeometryCollection Type
Definition: A collection including various geometries
Example:
GEOMETRYCOLLECTION(LINESTRING(15 15, 20 20), POINT(10 10),...)
Page 23: Inserting Data into Geometry Collections
Example SQL Command:
INSERT INTO table VALUES ST_GeomCollFromText(...)
Page 24: Temporal Data
Description: Data associated with a time frame, indicating when valid and stored
Importance: Vital for tracking changes over time in databases
Page 25: Types of Time
Transaction Time: When data entered
Valid Time: When the data value is true
Page 26: Temporal Data Types
Types:
Instant
Anchored Interval
Unanchored
Page 27: Modeling Temporal Data
Structure:
SHARE:
Attributes: shrcode, shrname
TRANSACTION:
Attributes: shrprice, shrqty, shrdiv
Page 28: TSQL
Purpose: Enhances SQL with features for data definition, constraints, and manipulation
Temporal Structured Query Language: Designed specifically for temporal data
Page 29: Conclusions on Data Management
Trend: Increasing need for spatial data maintenance due to location-based services
Future Needs: Common management of temporal data for comprehensive historical records
New Functions: Requirement for managing new data types effectively
Page 30: Organizational Intelligence Technologies
Quote by Machiavelli: Types of intelligence in organizations; superior understanding is the most valuable
Page 31: Organizational Intelligence Definition
Outcome: Collection, storage, processing, and interpretation of internal and external data
Importance: Central for effective decision-making and information distribution
Page 32: Information Systems Cycle
Loops:
Remembering the past (databases, data warehouses)
Present Handling (BI, data mining, DSS)
Preparing for the future (new business systems)
Page 33: The Problem of Fragmented Memory
Issues:
Organizational memory is fragmented: different systems, technologies, and locations
Underused systems lead to missed insights
Page 34: The Data Warehouse
Definition: A centralized repository for organizational data
Scale: Can contain petabytes of data
Components: Management tools, analysis tools, transformation engines
Page 35: Managing the Data Warehouse
Processes:
Extraction, Transformation, Cleaning, Loading, Scheduling, Metadata management
Page 36: Data Extraction Challenges
Complexity: Difficulties with pulling data from existing systems; typically time-consuming and ongoing
Operational Systems: May not facilitate straightforward data extraction
Page 37: Data Transformation
Common Transformations:
Encoding: AB to A/B
Units: inches to centimeters
Field Adjustments: Date formatting changes
Page 38: Data Cleaning
Challenges:
Duplicate records across departments
Inaccurate data entry leading to multiple entries
Misuse of data fields
Page 39: Scheduling Data Updates
Trade-offs:
Too frequent = costly
Infrequent = outdated data
Page 40: Metadata Definition
Functions: Data dictionary with additional facts about stored data
Description, format, coding standards, source info, frequency of extracts
Page 41: Warehouse Architectures
Types:
Centralized
Federated
Tiered
Page 42: Centralized Data Warehouse Structure
Components: Corporate data warehouse serving various analysts across multiple departments
Page 43: Federated Data Warehouse
Structure: Collaborative environment across various departments with shared corporate data warehouse
Page 44: Tiered Data Warehouse
Levels:
Tier 3: Detailed data
Tier 2: Summarized data
Tier 1: Highly summarized data
Page 45: Hardware/Software Choices
Default Technologies:
Hadoop for file management
MapReduce for processing
Commodity nodes for operations
Page 46: Data Store Utilization
Functions: Verification and discovery of patterns through data mining and OLAP technologies
Page 47: Data Verification and Discovery
Questions:
Average sales metrics, college graduation predictors, and sales insights based on varying factors
Page 48: OLAP Overview
Purpose: Complement RDBMS with multidimensional data views for synthesis and analysis
Page 49: TPS vs. OLAP Comparison
Key Differences:
TPS is optimized for transactions; OLAP for data analysis
Real-time vs. batch updates
Page 50: ROLAP Explanation
Concept: Multidimensional model imposed on a relational database structure
Efficiency: Less optimal compared to dedicated OLAP systems
Page 51: Star Schema Structure
Diagram Structure: Central fact table connected to dimensional tables with single joins
Page 52: Snowflake Schema Structure
Extension of Star Schema: Handles larger dimensional tables requiring multiple joins for data retrieval
Page 53: Rotation and Sales Data
Example Data: Summary metrics for sales across regions and years
Page 54: Sales Variance Analysis
Drill Down Analysis: Variance in sales by region and specific nations
Page 55: Hypercube Visualization
Concept: Three-dimensional representation of sales data metrics
Page 56: Three-Dimensional Hypercube Display
Example Data: Columns of sales data arranged by region and year
Page 57: Six-Dimensional Hypercube Structure
Example Dimensions: Brand, store, customer segment, product group, time frame, units sold
Page 58: Six-Dimensional Hypercube Display
Sales and Revenue Colors: Differentiated types of data visualized over different dimensions
Page 59: Link Between RDBMS and MDDB
Interactions: Data extraction/aggregation from RDBMS leads to drill-downs in MDDBs
Page 60: MDDB Design Concepts
Identifying Dimensions: Define what is tracked; utilize identifiers for tracking sales data
Page 61: Prompts for Identifying Dimensions
Examples: Qualified identifiers for time, location, product, and sales outcomes
Page 62: Variables and Identifiers
Example Data: Dimensions for tracking sales over specified time intervals
Page 63: Exercise on Multidimensional Database Design
Task: Identify dimensions for a hotel chain's marketing database design
Page 64: Analysis and Variable Types
Variables: Discussion of types of dimensions and statistical analysis types applicable to sales data
Page 65: Multidimensional Expressions (MDX)
Purpose: Language for data reporting in multidimensional databases, similar to SQL
Page 66: Pentaho Overview
Description: Open source Business Intelligence project bridging various BI tools
Page 67: Data Mining Applications
Use Cases: Analysis for predictions in marketing, loans, and scientific research
Page 68: Data Mining Functions
Types of Functions: Associations, patterns, classifications, cluster analyses, predictive analytics
Page 69: Data Mining Technologies
Techniques: Decision trees, genetic algorithms, neural networks, data visualization approaches
Page 70: SQL-99 and OLAP Interaction
Complexity in Queries: Multiple queries may be necessary for comprehensive revenue reporting
Page 71: SQL-99 Extensions
GROUP BY Enhancements: Features like GROUPING SETS, ROLLUP, and CUBE
Page 72: GROUPING SETS Example
SQL Example: Aggregating revenue figures across various channels and locations
Page 73: GROUPING SETS Results
Summary: Revenue figures by channel and location
Page 74: ROLLUP Functionality
SQL Example: Aggregate results summarized through hierarchical levels
Page 75: ROLLUP Results
Results Summary: Breakdown of revenues across multiple channels
Page 76: CUBE Functionality
SQL Example: Comprehensive grouping for revenue analysis by multiple dimensions
Page 77: CUBE Results Overview
Results Summary: Total revenue across dimensions in a complete listing
Page 78: MySQL ROLLUP Implementation
SQL Queries: Examples of applying formatting to aggregate results with rollup
Page 79: Exercises Using ClassicModels
Tasks: Revenue computations using ROLLUP variations
Page 80: SQL OLAP Extensions Significance
Comparison: Less powerful than specialized MDDB tools
Page 81: Concluding Remarks on Data Management
Evolution of Discipline: Responsibility of data managers now includes supporting organizational intelligence technologies.