In-Depth Notes on Database Systems and Big Data

Database Systems, Data Marts, and Data Warehouses

Learning Outcomes

  • Define a database and a database management system (DBMS).
  • Identify the components of a DBMS.
  • Explain logical database design and the relational database model.
  • Summarise recent trends in database design and use.
  • Explain the components and functions of a data warehouse.
  • Describe the functions of a data mart.

What is a Database?

  • A database is an organized collection of data stored and accessed electronically.
  • A Database Management System (DBMS) is software that enables efficient database management.
  • A Database Administrator (DBA) manages database operations.
  • Importance of databases in organizations:
    • Foundation for information and decision support systems.
    • Essential for accurate and efficient systems development projects.

Hierarchical Organization of Data

  • Data hierarchy includes:
    • Character: Eight bits forming a byte.
    • Field: A unit consisting of characters.
    • Record: A collection of related fields.
    • File: A collection of related records.
    • Database: A collection of integrated and related files.
  • Benefits of hierarchical data organization:
    • Increased information generation from same data.
    • Improved handling of complex queries.
    • Simplified management of data relationships.
    • Enhanced security and reduced redundancy.

Access Methods in Databases

  1. Sequential Access Method:
    • Data is accessed in the order entered.
    • Slower with large records; typically used for backups.
  2. Random-Access Method:
    • Allows direct access to records in any order.
    • Faster than sequential access; used with magnetic disks.
  3. Indexed Sequential Access Method (ISAM):
    • Combines sequential and random access methods using an index key.

Components of a DBMS

  • Database Engine: Core component for data storage, manipulation, and retrieval.
  • Data Definition: Data dictionary defining database structure using Data Definition Language (DDL).
  • Data Manipulation: Involves adding, deleting, modifying, and retrieving records using SQL or Query By Example (QBE).
  • Application Generation: Allows developers to create user-friendly interfaces.
  • Data Administration: Tasks include backup, recovery, security, and user permissions management.

Duties of a DBA

  • Designing and setting up databases.
  • Managing user access rights and security measures.
  • Developing recovery procedures for data loss or corruption.
  • Monitoring database performance and optimizing functions.

Defining a Data Model

  • Represents the first step in database design, specifying organization and relationships among records.
Types of Data Models
  • Relational: Data in two-dimensional tables.
  • Multidimensional: Supports data analysis in multiple dimensions.
  • Object-oriented: Combines data descriptions with procedures.
  • Relational object-oriented: Hybrid of relational and object-oriented models.

Relational Model Details

  • Data is stored in tables where:
    • Each row (record/tuple) contains data entries.
    • Columns are referred to as fields/attributes.
  • Data Dictionary:
    • Contains field names, types, default values, and validation rules.
  • Primary Key: Uniquely identifies each record.
    • Foreign Key: Links tables by referencing primary keys from other tables.

Data Operations in Relational Databases

  • Select Operation: Retrieves data from tables. Example:
    SELECTextCustomerName,CustomerSurnameextFROMCustomerSELECT ext{Customer Name, Customer Surname} ext{ FROM Customer}
  • Conditional Select: Retrieves based on conditions. Example:
    SELECTextCustomerName,CustomerSurnameextFROMCustomerWHERECustomerNumber=Cust02SELECT ext{Customer Name, Customer Surname} ext{ FROM Customer WHERE Customer Number = 'Cust02'}
  • Join Operation: Combines data from multiple tables. Example:
    SELECTextCustomerName,CustomerSurname,TotalCostofOrderextFROMCustomer,OrderWHERECustomer.CustomerNumber=Order.CustomerNumberSELECT ext{Customer Name, Customer Surname, Total Cost of Order} ext{ FROM Customer, Order WHERE Customer.Customer Number = Order.Customer Number}

Trends in Database Design and Use

  • Data-Driven Web Sites: Websites controlled by database content, ideal for e-commerce and news sites.
  • Distributed Databases: Data across multiple servers enhance alignment with organizational structure and reduce failure impacts.
  • Object-Oriented Databases: Encapsulate data with procedures facilitating efficient data manipulation and accommodating multimedia applications.

Data Warehouses

  • A centralized collection of data from various sources for decision-making support.
Characteristics
  • Subject-oriented, history-rich, aggregated, and analytical.

Components of a Data Warehouse

  • Input: Data from databases, ERP, and CRM systems.
  • ETL Process:
    • Extraction: Collects data in supported formats.
    • Transformation: Selects and formats needed data.
    • Loading: Transfers cleaned data into the warehouse.
  • Storage: Raw data, summary data, and metadata.
  • Output: Supports analysis and report generation.

Data Marts

  • Subsets of data warehouses targeting specific business areas.
Advantages
  • Faster access, quicker response times, lower development costs, and focused services.

Big Data

  • Era of large data volumes challenging traditional processing methods.
Dimensions
  • Volume: Measured in petabytes or exabytes.
  • Variety: Structured vs. unstructured data.
  • Velocity: Speed of data collection and processing.

Beneficiaries of Big Data

  • Retail: Customized offerings using data analysis.
  • Financial Services: Improved decision-making and fraud detection.
  • Advertising: Enhanced campaigns through rich data insights.
  • Government: Support for security efforts using big data.

Tools and Technologies of Big Data

  • Apache Hadoop, NoSQL, Cassandra, SAP Big Data Analytics, Tableau, SAS Big Data Analytics.

Big Data Privacy Risks

  • Risks include discrimination, privacy breaches, loss of anonymity, and limited legal protections.