Lecture 4B

SQL and Data Querying

Overview of Utility Company Database

  • A utility company maintains a database for managing service calls.

  • The call center processes "tickets" that track jobs based on reported issues.

Customer Scenarios

  • Scenario 1: Customer (ID 001234) queries about a faulty boiler repair.

  • Scenario 2: A worker inquires about their next job assignment.

Database Design

  • Tables for Database:

    • Customer: custId, name, phone, address

    • Worker: workId, name, phone

    • Ticket: ticketId, timeReported, problem, status, custId, workId

SQL Queries

  • Identify Worker Handling Customer Complaint:

    SELECT workId, name FROM Worker, Ticket  
    WHERE (Worker.workId = Ticket.workId)  
    AND (custId = 001234);  
  • Find Customer Address with Open Ticket:

    SELECT address FROM Ticket, Customer  
    WHERE Ticket.custId = Customer.custId  
    AND status = 'open'  
    ORDER BY timeReported;  

Referential Integrity and Database Normalisation

Referential Integrity

  • Ensures consistency between related tables in a relational database.

  • A foreign key in one table must reference a primary key in another table.

Normalisation vs Denormalisation

  • Normalisation: Data stored in multiple related tables to reduce redundancy.

  • Denormalisation: Consolidates data into fewer tables for performance optimization.

  • Example scenario governing the decision to denormalise based on organization’s activities.

Challenges with Large Databases

Common Issues

  • Managing multiple users and frequently accessed data across large R-tables.

  • Potential performance bottlenecks if tables are stored in various locations.

  • Denormalisation Advantages:

    • Optimizes query performance by allowing data redundancy based on real usage patterns.

Data Warehouse Concepts

Overview of Data Warehouse

  • Stores consolidated, denormalized historical data for analytical purposes.

  • Transaction data only loaded after activities are complete.

Data Mining

  • Process of analyzing large datasets to identify patterns and relationships.

  • Examples of queries:

    • Sales volume analysis during product promotions.

    • Correlation between products bought together.

Application of Data Mining

Domains Utilized

  • Banking: Fraud detection and customer loyalty analysis.

  • Healthcare: Hospital visit predictions based on past behaviors.

  • Supply Chain Management: Supplier reliability assessment.

Key Data Mining Applications

  • Consumer Clustering: Identifying characteristics of similar customers.

  • Churn Prediction: Detecting reasons customers switch to competitors.

  • Fraud Detection: Establishing parameters for likely fraudulent transactions.

Web Concepts

Surface Web vs Deep Web

  • Surface Web: Indexable and visible through search engines.

  • Deep Web: Not indexed; includes protected or uncrawlable content.

  • The deep web is estimated to contain 96% of the internet’s content, including databases and organizational repositories.

Dynamic Web Pages

  • Generated based on user interactions and access times, leading to different results for the same search at different times.

Client/Server Architecture

  • Highlights the interaction between client, server, and database through web applications.

Takeaway Messages

  • SQL is essential for data projection, joins, and restrictions.

  • Data warehouses facilitate historical data management and decision-making.

  • Data mining discovers hidden relationships and patterns in datasets.

  • Dynamic web pages rely on databases to tailor content to user needs.