Lecture 4B
SQL and Data Querying
Overview of Utility Company Database
A utility company maintains a database for managing service calls.
The call center processes "tickets" that track jobs based on reported issues.
Customer Scenarios
Scenario 1: Customer (ID 001234) queries about a faulty boiler repair.
Scenario 2: A worker inquires about their next job assignment.
Database Design
Tables for Database:
Customer: custId, name, phone, address
Worker: workId, name, phone
Ticket: ticketId, timeReported, problem, status, custId, workId
SQL Queries
Identify Worker Handling Customer Complaint:
SELECT workId, name FROM Worker, Ticket WHERE (Worker.workId = Ticket.workId) AND (custId = 001234);
Find Customer Address with Open Ticket:
SELECT address FROM Ticket, Customer WHERE Ticket.custId = Customer.custId AND status = 'open' ORDER BY timeReported;
Referential Integrity and Database Normalisation
Referential Integrity
Ensures consistency between related tables in a relational database.
A foreign key in one table must reference a primary key in another table.
Normalisation vs Denormalisation
Normalisation: Data stored in multiple related tables to reduce redundancy.
Denormalisation: Consolidates data into fewer tables for performance optimization.
Example scenario governing the decision to denormalise based on organization’s activities.
Challenges with Large Databases
Common Issues
Managing multiple users and frequently accessed data across large R-tables.
Potential performance bottlenecks if tables are stored in various locations.
Denormalisation Advantages:
Optimizes query performance by allowing data redundancy based on real usage patterns.
Data Warehouse Concepts
Overview of Data Warehouse
Stores consolidated, denormalized historical data for analytical purposes.
Transaction data only loaded after activities are complete.
Data Mining
Process of analyzing large datasets to identify patterns and relationships.
Examples of queries:
Sales volume analysis during product promotions.
Correlation between products bought together.
Application of Data Mining
Domains Utilized
Banking: Fraud detection and customer loyalty analysis.
Healthcare: Hospital visit predictions based on past behaviors.
Supply Chain Management: Supplier reliability assessment.
Key Data Mining Applications
Consumer Clustering: Identifying characteristics of similar customers.
Churn Prediction: Detecting reasons customers switch to competitors.
Fraud Detection: Establishing parameters for likely fraudulent transactions.
Web Concepts
Surface Web vs Deep Web
Surface Web: Indexable and visible through search engines.
Deep Web: Not indexed; includes protected or uncrawlable content.
The deep web is estimated to contain 96% of the internet’s content, including databases and organizational repositories.
Dynamic Web Pages
Generated based on user interactions and access times, leading to different results for the same search at different times.
Client/Server Architecture
Highlights the interaction between client, server, and database through web applications.
Takeaway Messages
SQL is essential for data projection, joins, and restrictions.
Data warehouses facilitate historical data management and decision-making.
Data mining discovers hidden relationships and patterns in datasets.
Dynamic web pages rely on databases to tailor content to user needs.