In-Depth Notes on Database Systems and Big Data
Database Systems, Data Marts, and Data Warehouses
Learning Outcomes
- Define a database and a database management system (DBMS).
- Identify the components of a DBMS.
- Explain logical database design and the relational database model.
- Summarise recent trends in database design and use.
- Explain the components and functions of a data warehouse.
- Describe the functions of a data mart.
What is a Database?
- A database is an organized collection of data stored and accessed electronically.
- A Database Management System (DBMS) is software that enables efficient database management.
- A Database Administrator (DBA) manages database operations.
- Importance of databases in organizations:
- Foundation for information and decision support systems.
- Essential for accurate and efficient systems development projects.
Hierarchical Organization of Data
- Data hierarchy includes:
- Character: Eight bits forming a byte.
- Field: A unit consisting of characters.
- Record: A collection of related fields.
- File: A collection of related records.
- Database: A collection of integrated and related files.
- Benefits of hierarchical data organization:
- Increased information generation from same data.
- Improved handling of complex queries.
- Simplified management of data relationships.
- Enhanced security and reduced redundancy.
Access Methods in Databases
- Sequential Access Method:
- Data is accessed in the order entered.
- Slower with large records; typically used for backups.
- Random-Access Method:
- Allows direct access to records in any order.
- Faster than sequential access; used with magnetic disks.
- Indexed Sequential Access Method (ISAM):
- Combines sequential and random access methods using an index key.
Components of a DBMS
- Database Engine: Core component for data storage, manipulation, and retrieval.
- Data Definition: Data dictionary defining database structure using Data Definition Language (DDL).
- Data Manipulation: Involves adding, deleting, modifying, and retrieving records using SQL or Query By Example (QBE).
- Application Generation: Allows developers to create user-friendly interfaces.
- Data Administration: Tasks include backup, recovery, security, and user permissions management.
Duties of a DBA
- Designing and setting up databases.
- Managing user access rights and security measures.
- Developing recovery procedures for data loss or corruption.
- Monitoring database performance and optimizing functions.
Defining a Data Model
- Represents the first step in database design, specifying organization and relationships among records.
Types of Data Models
- Relational: Data in two-dimensional tables.
- Multidimensional: Supports data analysis in multiple dimensions.
- Object-oriented: Combines data descriptions with procedures.
- Relational object-oriented: Hybrid of relational and object-oriented models.
Relational Model Details
- Data is stored in tables where:
- Each row (record/tuple) contains data entries.
- Columns are referred to as fields/attributes.
- Data Dictionary:
- Contains field names, types, default values, and validation rules.
- Primary Key: Uniquely identifies each record.
- Foreign Key: Links tables by referencing primary keys from other tables.
Data Operations in Relational Databases
- Select Operation: Retrieves data from tables. Example:
SELECTextCustomerName,CustomerSurnameextFROMCustomer - Conditional Select: Retrieves based on conditions. Example:
SELECTextCustomerName,CustomerSurnameextFROMCustomerWHERECustomerNumber=′Cust02′ - Join Operation: Combines data from multiple tables. Example:
SELECTextCustomerName,CustomerSurname,TotalCostofOrderextFROMCustomer,OrderWHERECustomer.CustomerNumber=Order.CustomerNumber
Trends in Database Design and Use
- Data-Driven Web Sites: Websites controlled by database content, ideal for e-commerce and news sites.
- Distributed Databases: Data across multiple servers enhance alignment with organizational structure and reduce failure impacts.
- Object-Oriented Databases: Encapsulate data with procedures facilitating efficient data manipulation and accommodating multimedia applications.
Data Warehouses
- A centralized collection of data from various sources for decision-making support.
Characteristics
- Subject-oriented, history-rich, aggregated, and analytical.
Components of a Data Warehouse
- Input: Data from databases, ERP, and CRM systems.
- ETL Process:
- Extraction: Collects data in supported formats.
- Transformation: Selects and formats needed data.
- Loading: Transfers cleaned data into the warehouse.
- Storage: Raw data, summary data, and metadata.
- Output: Supports analysis and report generation.
Data Marts
- Subsets of data warehouses targeting specific business areas.
Advantages
- Faster access, quicker response times, lower development costs, and focused services.
Big Data
- Era of large data volumes challenging traditional processing methods.
Dimensions
- Volume: Measured in petabytes or exabytes.
- Variety: Structured vs. unstructured data.
- Velocity: Speed of data collection and processing.
Beneficiaries of Big Data
- Retail: Customized offerings using data analysis.
- Financial Services: Improved decision-making and fraud detection.
- Advertising: Enhanced campaigns through rich data insights.
- Government: Support for security efforts using big data.
- Apache Hadoop, NoSQL, Cassandra, SAP Big Data Analytics, Tableau, SAS Big Data Analytics.
Big Data Privacy Risks
- Risks include discrimination, privacy breaches, loss of anonymity, and limited legal protections.