Database Management Systems Notes

  • Database Modeling Phases

    • Conceptual data modeling: High-level, abstract representation of the database.

    • Logical data modeling: Transforms the conceptual model into a detailed, implementation-independent structure using tables and constraints.

    • Physical data modeling: Transforms the logical model into a specific implementation optimized for the chosen DBMS and hardware/software environment.

    Conceptual Models
    • Business Rules

      Business rules define and restrict aspects when designing a Database Management System, derived from policies, procedures, events, and functions within an organization. Key characteristics include description, applicability, examples, documentation, and understandability. A good business rule is declarative, precise, atomic, consistent, expressible, distinct, and business-oriented.

    • Entities and Attributes

      • Entities: Objects or real-world concepts represented and stored in a database.

      • Strong entity: Exists independently with a unique identifier (PI).

      • Weak entity: Depends on a strong entity and lacks its own primary identifier.

      • Associative entity: Represents many-to-many relationships.

      • Attributes: Properties that describe an entity.

      • Required vs. Optional: Must have a value vs. may or may not have a value.

      • Simple vs. Composite: Atomic vs. made up of sub-attributes.

      • Single-Valued vs. Multivalued: One value vs. multiple values.

      • Stored vs. Derived: Directly stored vs. calculated from other attributes.

      • Identifier Attributes: Uniquely identify entity instances (primary keys).

    • Relationships

      Relationships are associations between entities in an Entity-Relationship Diagram (ERD).

      • Unary Relationship: An entity related to itself.

      • Binary Relationship: Involves two entities.

      • Ternary Relationship: Involves three or more entities.

      • Cardinality and Constraints

      • Cardinality: Maximum number of related instances (1:1, 1:N, M:N). Cardinality specifies the possible relationships and numerical constraints between entities. For instance, in a 1:N relationship, one instance of entity A can relate to multiple instances of entity B, but each instance of entity B relates to only one instance of entity A.

      • Constraints: Minimum number of related instances (zero/optional or one/mandatory). Constraints dictate whether participation in a relationship is optional or mandatory. An optional constraint (zero) means an entity instance may not participate in the relationship. A mandatory constraint (one) means an entity instance must participate in the relationship.

    From Conceptual Models to Logical Models
    • Attributes Considerations in the Logical Model

      • Required vs. Optional: Defined as nullable or not null.

      • Simple vs. Composite: Composite attributes use round brackets e.g. employeeAddress (unit, streetAddress, streetName, suburb, areaCode, region)

      • Single-Valued vs. Multivalued: Multivalued attributes use squiggly brackets e.g. staffEmail vs {otherEmails}

      • Stored vs. Derived: Derived attributes shown in square brackets e.g. seasonName, [numberOfEpisodes]

      • Identifier Attributes: Shown with PI or WPI next to attributes e.g. (strong or weak) PI vs WPI

    • Transitioning from Conceptual to Logical Modeling

      • Relations (tables) correspond to entities.

      • Rows correspond to values.

      • Columns correspond to attributes.

    • Key Fields

      • Primary Keys: Uniquely identify records within a table.

      • Foreign Keys: Establish connections between related tables, referencing primary keys of other tables.

      • Simple vs. Composite Keys: Single field vs. multiple fields forming a unique identifier.

    • Integrity Constraints

      • Entity Integrity: No primary key attribute can be null.

      • Referential Integrity: Foreign key values must match primary key values in the parent table or be null.

    • Delete Rules

      • Restrict: Prevents deletion of a parent record if related records exist in a dependent table.

      • Cascade: Deletes all related records in the dependent table when a parent record is deleted.

      • Set-to-Null: Sets foreign key values in the dependent table to null when a parent record is deleted.

    • Transforming ER Diagrams into Relations (Steps)

      1. Strong entities: Create a relation for each entity, including all simple attributes. Identify PI (Primary key).

      2. Weak entities: Create a relation for each, including all simple attributes. PK (Primary key) is a combination of the primary key/s of the ‘owner’ and the main attribute of the weak entity itself.

      3. 1:1 Relationship: Choose the PK of the Relation (usually the one with mandatory constraint) and make it as the foreign key of the other relation.

      4. 1:* (1:M) Relationship: Include as the foreign key in the relation that holds the N side, the primary key of the other entity (that holds the 1 side).

      5. M:N (M:M): Create a new relation to represent the relationship e.g. associative entities. The primary key of the new relation is the combination of the primary keys of the two connected entities. Include attributes on the relationship, include them.

      6. Multi-valued Attributes: Create a new relation containing the multivalued attribute and the primary key of the entity.

      7. Associative Entities: For each n-ary relationship create a new relation to represent the relationship. The primary key of the new relation is the combination of the primary keys of the participating entities that hold the N (many) side.

    From Logical Models to Physical Models
    • Physical Database Design

      Converts the logical data model into technical specifications for data storage and retrieval.

      • Key Objectives: Performance, Integrity, Security, Recoverability

      • Designing Fields: Involves decisions on data representation and management; choose appropriate data types, coding, compression, encryption, and controlling data integrity.

    • Indexes and Query Optimization

      Indexes enhance data retrieval speed.

    SQL Introduction
    • SQL: Standardized language for relational database queries.

    • Data Types: Define the type of value a column can hold.

    SQL
    • Data Definition Language (DDL): CREATE, DROP, ALTER, TRUNCATE

    • Data Manipulation Language (DML): INSERT, UPDATE, DELETE

    • Data Query Language (DQL): SELECT

    • MySQL Workbench

      Example commands (on local connection NOT ASH, create schema)
      • CREATE SCHEMA nameofschema;

      • USE nameofschema

      • CREATE TABLE table_name (thing CHAR(8), thing VARCHAR(30), thing DATE, thing BOOLEAN, thing INT, thing DECIMAL(5,2), PRIMARY KEY (thing));

      • INSERT INTO table_name VALUES (list of things);

      • SELECT * FROM table_name;

      • ALTER TABLE tablename ADD COLUMN columnname VARCHAR(30);

      • UPDATE tablename SET columnname = ‘thing’ WHERE DirectorID = ‘D1’;

      • ALTER TABLE tablename DROP COLUMN columnname;

      • DROP TABLE table_name;

      • SELECT * FROM table_name;

    • Keys (Integrity)

      • Entity Integrity (PK): Ensures no duplicate records, enforced through the primary key constraint.

      • Referential Integrity (FK): Ensures foreign key values correctly reference the source (parent) table.

    • Foreign Keys

      Foreign keys are defined using the following SQL syntax:

      • Composite Foreign Keys: A foreign key can consist of multiple columns:

      • Column Name Differences: The name of the foreign key column does not necessarily have to match the name of the column in the source table.

    • Join Types

      • Inner Joins: Returns rows when there is a match in both tables. Excludes non-matching records.

      • Outer Joins: Includes all rows from one table and matching rows from the other table; NULL values used for non-matching columns.

      • Left Join (Left Outer Join): Includes all records from the left table and matching records from the right table.

      • Right Join (Right Outer Join): Includes all records from the right table and matching records from the left table.

      • Cross Join (Cartesian Product): Each record from one table is matched with each record from the other table.

    • Aggregate Queries

      Compute summary values from a set of rows.

      • Without GROUP BY: Provides overall summary of all records.

      • With GROUP BY: Provides group summaries.

    • Nested Queries (sub-queries)

      A join can be converted to a nested query, but not all nested queries can be converted into a join.

    Functional Dependencies in Data and Normalisation
    • Normalization

      Reduces redundant/duplicate data, minimizing potential problems.

      • Anomalies: Unexpected behavior caused by an action.

      • Insertion anomaly: adding new rows forces the creation of duplicate data.

      • Deletion anomaly: deleting rows may cause a loss of data that would be needed for other future rows.

      • modification /update anomaly: changing data in a row forces changes to other rows because of duplication.

    • Functional Dependency

      Is the relation between attributes.

    Database Applications
    • Database Application Development

      Database Applications are software used to facilitate data collection, storage, management, and retrieval.

    • Database in Client-Server Architecture Models

      • Client: requests and uses a service

      • Server: provides a service Note: a server would act as a client when requesting service from another server.

      • Front-end functions: functions to interact with the end-users

      • Back-end functions: functions to interact with the servers

    • System Tasks

      Presentation Logic

      Handles the input and output. Such as getting inputs from keyboard and mouse, displaying the outputs on-screen, and rendering the graphical user interface.

      Processing Logic

      Handles the processing of input and output, ensuring that the business rules are followed, and managing the data.

      Storage Logic

      Handles the storage and retrieval of data.

    • Two-tier Architecture Model

      In a two-tier architecture, all back-end applications reside in one server.

      Advantages

      • Easy to Maintain

      • Simpler modification

      • Faster Communication

      Disadvantages

      • Performance degraded as more users added

      • Cost in-effective Clients

    • Clients

      Thick (Fat) Client

      A fat client (or thick client) performs most of the processing itself, relying on the server mainly for data storage. In a database system, this means that queries, calculations, and business logic are handled on the client-side. Example: A desktop application that runs complex analytics locally but fetches raw data from the server.

      Thin Client

      A thin client depends heavily on the server for processing, with minimal functionality on the client-side. The client typically only handles the user interface, while all data processing and business logic occur on the server. Example: A web-based application where the browser sends requests, and the server processes queries and returns results.

      Distributed Architecture

      A distributed system spreads processing across multiple servers or clients. Instead of relying solely on a single server, tasks are distributed, improving performance and fault tolerance. Example: A cloud-based database that balances query loads across multiple servers.

    • Three-Tier and N-Tier Architecture Model

      Processing logic will be at a separate server (three-tier), or distributed amongst multiple servers (n-tier) to share the processing tasks.

      Example of a four-tier architecture
      • Database server – hosts the DBMS e.g., Oracle, SQL Server, Informix, MS Access, MySql

      • Application server – software building blocks for creating dynamic web sites e.g., MS ASP .NET framework, Java EE, ColdFusion, PHP

      • Web server – receives and responds to browser requests using HTTP protocol e.g., Apache, Internet Information Services (IIS)

      • Web browser – client program that sends web requests and receives web pages e.g., Internet Explorer, Firefox, Safari, Google Chrome

    • The advantages and disadvantages of three-tier and n-tier architecture

      Advantages
      • Scalability

      • Technological flexibility

      • Long-term cost reduction

      • Better match of systems to business needs

      • Improved customer service

      • Competitive advantage

      • Reduced risk

      • High Performance as the load is reduced

      • Improved security

      • Good Performance

      • Easy to maintain

      Disadvantages
      • Increased complexity

      • Increased effort

    • Sustainability (Green IS and IT)

      Green IT is the consideration of the environmental impact of the development and the usage of IT components (hardware, software, and data).

      Green IS includes Green IT but takes more holistic consideration from various components' integration to ensure sustainable business processes.

    • Data Ethics

      It professionals are bound by the Australian Computer Society Code of Professional Ethics, where honesty, trustworthiness, and respect are the pillars. When dealing with data, these [moral] guidelines are followed:

      • Consent: Organisations should seek permission to collect an individual's personal data.

      • Fairness: Data shouldn't be used in ways that perpetuate biases or lead to harm orг discrimination.

      • Intention: Data use should serve society and shouldn't disadvantage the individual.

      • Integrity: Data use should be accurate and high quality, and not misrepresent facts or lead to unethical decisions.

      • Stewardship: Data collection and management should be done in a protected and secure environment.

    Data Warehouses + Data Lakes + Big Data + Reporting (Data Theory)
    • Data Warehouse

      A central repository of data that allows for informed decisions. It is subject-oriented, integrated, time-variant, and non-updatable. It is optimized for complex queries and data analysis with historical, summarized data. Users include business analysts, managers, and decision-makers.

    • Data Mart

      A data mart is a limited-scope data warehouse subset that provides fast access to relevant data for specific user groups, such as Sales or Finance.

    • Star Schema

      A common dimensional model for decision support, the objectives of which are detailed, aggregate, and distributed data. A fact table contains keys and measures, while dimension tables contain descriptive attributes.

    • Data Quality

      Essential for business success; bad data leads to failure. Characteristics include completeness, accuracy, consistency, validity, uniqueness, and relevance. Causes include external sources, redundant storage, poor data entry, and lack of commitment. Techniques for improvement: Consolidation (ETL), data federation (EII), data propagation (EAI and EDR).

    • Data vs. Database Administration

      • Data Administration: Overall management of data resources, policies, and standards.

      • Database Administration: Technical function for physical database design, security, performance, and recovery.

    • Data Security

      Protection against accidental or intentional loss, destruction, or misuse. Threats include human error, software/hardware failure, theft, privacy loss, integrity loss, and availability loss. Implementation strategies are authorisations, encryption, and authentication (passwords, multi-factor).

    • Encryption

      Encryption is used to protect sensitive information. Basic two-key encryption: Public key (encryption) and Private Key (decryption).

    • Data Recovery

      A mechanism for restoring a database after loss or damage. Facilities include backup,journalising (transaction and database change logs), and checkpoint. Procedures involve disk mirroring, restore/rerun, and transaction integrity (rollback/roll forward).

    • Big Data

      High volume, variety, and velocity data requiring advanced processing. Key characteristics are volume, variety, velocity, veracity, and value. Challenges include handling large amounts of data, capturing information,