Transaction Processing

Transaction Processing - CS-4347 Database Systems

Introduction to Transaction Processing

  • Definition of a Transaction:

    • A transaction is defined as a logical unit of database processing that includes one or more access operations, which can be either read (retrieval) or write (insert, update, delete).

    • Characteristics of a transaction include:

    • Stand-alone specification in high-level languages like SQL.

    • Submission can be interactive or embedded within a program.

  • Transaction Boundaries:

    • Indicated by Begin and End transaction.

    • An application program may comprise several transactions demarcated by these boundaries.

Transaction Processing Hierarchy

  • Hierarchy Structure:

    • Schedule

    • Composed of multiple Transactions

      • Each transaction consists of one or more Operations

      • Operations are further divided into Sub-operations.

Simple Model of a Database

  • For the purpose of discussing transactions:

    • A database is conceptualized as a collection of named data items.

    • Granularity of Data:

    • Can be a field, a record, or a whole disk block.

    • Basic operations are encapsulated by read and write:

    • read_item(X): Reads a database item named X into a program variable, which simplifies notation as assuming the variable name is also X.

    • write_item(X): Writes the value from the program variable named X into the database item named X.

READ AND WRITE OPERATIONS

  • Basic Data Transfer:

    • The fundamental unit of data transfer from disk to computer memory is one block.

    • Typically, a data item consists of a field from a record in the database, although it may be larger (e.g., a record or an entire block).

  • read_item(X) Implementation Steps:

    • Identify the address of the disk block containing item X.

    • Transfer that disk block to a buffer in main memory (if it isn’t already present).

    • Move item X from the buffer to the program variable named X.

  • write_item(X) Implementation Steps:

    • Identify the address of the disk block containing item X.

    • Transfer that disk block to a memory buffer (if it isn’t present).

    • Move item X from the program variable into the corresponding spot in the buffer.

    • Store the updated block back to disk (either immediately or later).

Two Sample Transactions

  • Transaction T₁:

    • Operations:

    • read_item(X)

    • X ← X - N

    • write_item(X)

  • Transaction T₂:

    • Operations:

    • read_item(Y)

    • Y ← Y + N

    • write_item(Y)

Why Concurrency Control is Needed

  • Lost Update Problem:

    • Occurs when operations from two transactions interleave in a way that the value of some database item becomes incorrect.

  • Temporary Update (Dirty Read) Problem:

    • Arises when a transaction updates an item and then fails; another transaction reads the updated item before it is reverted to the original value.

  • Incorrect Summary Problem:

    • Happens when one transaction calculates an aggregate summary while other transactions modify relevant records, leading to inconsistent summary values.

  • Unrepeatable Read Problem:

    • Occurs when a transaction reads the same item twice, but another transaction modifies it between these reads, causing different values to be returned.

    • Example: In an airline reservation system, a customer may check seat availability, and upon making a selection, the confirmed seat count may differ by the time they finalize the reservation.

Problems from Uncontrolled Concurrent Execution

  • Example of Lost Update Problem:

    • Transaction T₁ updates item X, but T₂ overwrites that update.

  • Example of Temporary Update Problem:

    • T₁ updates item X, but then fails, leading T₂ to read an incorrect temporary value.

  • Example of Incorrect Summary Problem:

    • T₁ calculates a summary using X after it was decremented and Y before it was incremented, leading to an incorrect final summary.

Need for Recovery

  • Why Recovery is Needed:

    • Ensures that when a transaction is executed, either all operations succeed and modify the database permanently or none do (complete rollback).

  • Commit vs. Abort:

    • A transaction is Committed if successful and permanently recorded.

    • A transaction is Aborted if it fails and has no lasting effects.

Types of Failures

  • Failures are categorized as follows:

    1. Computer Failure (System Crash):

    • Hardware/software errors causing loss of internal memory contents.

    1. Transaction or System Error:

    • Errors such as division by zero or logic errors, may also occur if the user interrupts the transaction.

    1. Local Errors:

    • Errors detected within a transaction prompting cancellation, e.g., insufficient account balance preventing withdrawal.

    1. Concurrency Control Enforcement:

    • Sometimes transactions are aborted to maintain serializability or avoid deadlocks.

    1. Disk Failure:

    • Data loss from a malfunction during read/write operations.

    1. Physical Problems and Catastrophes:

    • Various issues like power failure, theft, or errors due to human mistakes.

Transaction and System Concepts

  • Definition of a Transaction:

    • An atomic unit of work that must either complete in its entirety or not at all.

  • Importance of Tracking:

    • The system must keep track of when a transaction starts, terminates, commits, or aborts.

  • Transaction States:

    • Active State: Transaction is currently executing.

    • Partially Committed State: Operations are completed; awaiting validation.

    • Committed State: All operations are completed, and the database reflects this.

    • Failed State: The transaction has failed, and operations are being rolled back.

    • Terminated State: Transaction has concluded.

Recovery Manager Operations

  • Operations tracked by Recovery Manager:

    • BEGIN_TRANSACTION: Marks the start of transaction execution.

    • READ or WRITE: Specifies operations executed as part of a transaction.

    • END_TRANSACTION: Marks the conclusion of transaction actions, indicating the need to check for commit or abort status.

    • COMMIT_TRANSACTION: Signals successful completion, allowing permanent changes to the database.

    • ROLLBACK (or ABORT): Signals unsuccessful completion, prompting a rollback of applicable changes.

    • UNDO: Similar to rollback but focuses on a single operation.

    • REDO: Ensures that certain operations of a committed transaction are re-applied successfully.

The System Log

  • Definition of a Log or Journal:

    • Records all transaction operations affecting database values, essential for recovery.

    • Kept on disk, immune to failures except for catastrophic disk issues.

    • Periodically backed up to archival storage to mitigate risks.

  • Transaction Identifier (T):

    • A unique transaction-id generated to identify each transaction.

Types of System Log Entries

  • Log Entry Types:

    • [start_transaction, T]: Indicates that transaction T has started execution.

    • [write_item, T, X, old_value, new_value]: Records that T changed the value of X from old to new.

    • [read_item, T, X]: Records that T has read the value of X.

    • [commit, T]: Records successful completion of T, affirming permanent database effect.

Desirable Properties of Transactions (ACID)

  • ACID Properties:

    • Atomicity: A transaction is atomic; either fully completed or not at all.

    • Consistency Preservation: Must take the database from one consistent state to another.

    • Isolation:Transaction changes should only be visible upon commitment, resolving temporary update problems.

    • Durability: Once changes are committed, they must persist, not lost to future failures.

Characterizing Schedules Based on Serializability

  • Definitions:

    • Serial Schedule: All operations of a transaction T in a schedule are executed consecutively.

    • Non-serial Schedule: Any operation of T that does not execute consecutively.

    • Result Equivalent: Two schedules with identical final database states.

    • Serializable Schedule: Equivalent to a serial schedule of the same transactions.

    • Conflict Equivalent: Schedules with the same order of conflicting operations.

Schedule Equivalence and Tests

  • Conflict Serializable: Schedule that matches a conflict equivalent serial schedule.

  • Complexity:

    • Serializability is complex to check due to unpredictable interleaving in a scheduler.

  • Testing for Conflict Serializability:

    • Utilizing a precedence graph to ensure no cycles exist, allowing for serialization.

Precedence Graphs and Cycles

  • Graph Construction:

    • Nodes represent transactions; directed edges represent conflicting operations.

    • No cycles indicate serializability while cycles indicate non-serializability.

Conclusion on Transaction Processing

  • Transactions must be well-formed to ensure database integrity and handle concurrent operations through defined protocols for managing states.

  • Strong recovery mechanisms are conceptually crucial for upholding the logic and structures of databases amid failures and user interactions.

This document encapsulates expansive insights into transaction processing including the fundamental operations, challenges, and properties essential for the successful execution and management of transactions within a database system.