Transaction Concept
- A transaction is a logical unit of work in database processing.
- It includes one or more database access operations.
- It can be defined as a series of actions performed by a user or application to access database contents (retrieval, insertion, deletion, modification).
- A transaction is a program unit that may change the contents of a database.
- A transaction must be either fully completed or aborted.
- Much of the complexity of a DBMS is hidden behind the transaction interface.
Consistency in Transactions
- Transaction execution preserves the consistency of the database before and after its execution.
- A database transaction must be atomic (all operations are completed or none are completed).
- A consistent database state satisfies all data integrity constraints.
- Each transaction should access shared data without interfering with other transactions.
- When a transaction completes, its changes should be permanent.
- If a transaction fails, it should have no effect on the stored database.
- Example: Customer paying a bill online.
Read and Write Operations of a Transaction
- A transaction is a sequence of READ and WRITE actions.
- Whenever data is read from or written to the database, a transaction is created.
- A transaction may consist of a simple SELECT operation or a series of related UPDATE command sequences.
Read Operations
Read_item(X)
: Reads a database item named X into a program variable Y.- Steps:
- Find the address of the disk block containing item X.
- Copy that disk block into a buffer in main memory.
- Copy item X from the buffer to the program variable Y.
Write Operations
Write_item(X)
: Writes the value of a program variable Y into database item named X.- Steps:
- Find the address of the disk block containing item X.
- Copy that disk block into a buffer in main memory.
- Copy item X from the program variable Y into its current location in the buffer.
- Store the updated block from the buffer back to disk.
Example Transaction
BEGIN TRANSACTION_1:
READ (TABLE=TI, ROW=15, OBJECT = COLI);
:COLI COLI+500;
WRITE (TABLE=TI, ROW=15, OBJECT=COLI, VALUE=:COLI);
READ (TABLE=T2, ROW = 15, OBJECT = COL2):
:COL2 COLI+500;
WRITE (TABLE=T2, ROW=30, OBJECT=COL2, VALUE =:COL2);
READ (TABLE=T3, ROW-30, OBJECT = COL3);
:COL3 COLI+500:
WRITE (TABLE=T3, ROW=45, OBJECT=COL3, VALUE=:COL3);
END OF TRANSACTION_1;
Transaction Execution and Potential Problems
- A transaction may contain a single or a sequence of database operations that transforms the database from one consistent state to another.
- The simplest transaction processing system executes transactions serially, which is impractical for large multi-user databases.
- Mechanisms are needed to enable multiple transactions to execute concurrently without causing conflicts or inconsistencies.
- A transaction that successfully completes its execution is said to have been committed.
Transaction States
- Active State: After the transaction starts its operation.
- Partially Committed: When the last operation is reached.
- Aborted: When normal execution can no longer be performed.
- Committed: After successful completion of the transaction.
Example Transaction with SQL
Committing changes using SQL:
UPDATE EMPLOYEE
SET EMP-LOAN-BAL=EMP-LOAN-BAL-10000
WHERE EMP-ID='106519';
UPDATE PROJECT
SET PROJ-COST=PROJ-COST+40000
WHERE PROJ-ID='PROJ-I';
COMMIT;
ACID Properties
To ensure a database remains in a stable state after a transaction, it must possess the following ACID properties:
- Atomicity: All or nothing; ensured by the recovery subsystem of the DBMS.
- Consistency: Generally the responsibility of the programmer or DBMS module that enforces integrity constraints.
- Isolation: Enforced by the concurrency control subsystem of the DBMS.
- Durability: Responsibility of the recovery subsystem of the DBMS.
Transaction Log (or Journal)
- DBMS maintains a transaction record of every change made to the database in a log.
- Helps the DBMS to recover from failures that affect transactions.
- The DBMS automatically updates the transaction log while executing transactions that modify the database.
- It stores before and after data about the database and any tables, rows, and attribute values that participated in the transaction.
- Uses of the transaction log increase the processing overhead and overall cost of the system.
- Contains critical data and implementation support periodic backups on several disks or tapes to reduce the risk of system failure.
Data Recorded in Transaction Log
- A start of transaction marker.
- Transaction Identifier.
- Record Identifier.
- Operation Performed on Records.
- Previous value of the modified data (undo log).
- Updated value of the modified record (redo part of the log).
- A commit transaction marker.
- Log is written before any updates are made to the database (write-ahead log strategy).
Concurrency Control
- Concurrency control manages simultaneous execution of transactions in a multiprocessing database system without interference.
- Allows many transactions to access the same database at the same time without interfering with each other.
- Ensures atomicity of the execution of transactions in a multi-user database environment.
- Simultaneous execution of transactions can create data integrity and consistency problems.
Problems of Concurrency Control
The three main problems of concurrency control are:
- Lost Updates
- Dirty Read (or uncommitted data)
- Unrepeatable Read (or inconsistent retrievals)
Lost Update Problem
- Occurs when two transactions access the same database items and their operations make the value of some database item incorrect.
- If transactions T1 and T2 both read a record and then update it, the effects of the first update will be overwritten by the second update.
Dirty Read Problem
- Occurs when one transaction updates a database item and then fails.
- The updated database item is accessed by another transaction before it is changed back to the original value.
- A transaction T1 updates a record, which is read by transaction T2, then T1 aborts, and T2 now has values that never formed part of the stable database.
Unrepeatable Read Problem
- Occurs when a transaction calculates some summary function over a set of data while other transactions are updating the data.
- A transaction might read some data before they are changed and other data after they are changed, thereby yielding inconsistent results.
- Transaction T1 reads a record, then transaction T2 updates the record. If T1 rereads the record, the new value will be inconsistent with the previous value.
Degree of Consistency
Gray (1976) defined four levels of transaction consistency:
- Level 0 Consistency:
- Transactions are not recoverable.
- May have interactions with the external world that cannot be undone.
- Transaction T does not overwrite other transaction’s dirty (or uncommitted) data.
- Level 1 Consistency:
- Minimum consistency requirement that allows transactions to be recovered in the event of system failure.
- Transaction T does not overwrite other transaction’s dirty data.
- Transaction T does not make any of its updates visible before it commits.
- Level 2 Consistency:
- Isolates from the updates of other transactions.
- Transaction T does not overwrite other transaction’s dirty (or uncommitted) data.
- Transaction T does not make any of its updates visible before it commits.
- Transaction T does not read other transaction’s dirty data.
- Level 3 Consistency:
- Adds consistent reads so that successive reads of a record will always give the same values.
- Transaction T does not overwrite other transaction’s dirty data.
- Transaction T does not make any of its updates visible before it commits.
- Transaction does not read other transaction’s dirty data.
- Transaction T can perform consistent reads; no other transaction can update data read by the transaction T before T has committed.
Permutable Actions
- An action is a unit of processing that is indivisible from the DBMS’s perspective.
- Actions are typically read-page and write-page.
- A pair of actions can or cannot be permutable on the same granule, where it is always permutable on different granules.
- Permutable: if every execution of Ai followed by Aj have the same results if order of action is changed in execution.
- A pair of actions is permutable if every execution of Ai followed by Aj have the same results as the execution of Aj followed by Ai on the same granule.
- Read-Read: Permutable
- Read-Write: Not Permutable
- Write-Write : Not Permutable
Schedule
- A schedule is a sequence of actions or operations that is constructed by merging the actions of a set of transactions.
- DBMS has built-in software called a scheduler, which determines the correct order of execution.
- It establishes the order in which operations within concurrent transactions are executed.
- It bases its actions on a concurrency control algorithm.
- Complete Schedule
- Serial Schedule
- Non-Serial Schedule
Serializable Schedule
- The objective of concurrency control is to arrange or schedule the execution of transactions to avoid any interference.
- This can be achieved by the execution and commit of one transaction at a time in serial time.
- A Serializable schedule is a schedule that allows a set of transactions to execute in some order such that the effects are equivalent to executing them in some serial order.
- Serializability describes the concurrent execution of several transactions.
- The objective of Serialisability is to find the non-serial schedules that allow transactions to execute concurrently without interfering with one another and thereby producing a database state that could be produced by a serial execution
- The order of Read and Write operations are important in Serialisability
Precedence Graph
- Serializability can also be depicted by constructing a precedence graph.
- A precedence relationship can be defined as transaction T1 precedes transaction T2 and between T1 and T2 if there are two non permutable actions A1 and A2 and A1 is executed by T1 and A2 is executed by T2.
- Vertices: Set of transactions
- Arc: T1 precedes T2
- A schedule is Serializable if the precedence graph is not cyclic.
Locking Methods for Concurrency Control
- A lock is a variable associated with a data item that describes the status of the item with respect to possible operations.
- It prevents access to a database record by a second transaction until the first transaction has completed all of its actions.
- Locks are used to synchronize access by concurrent transactions to database items.
- Locking schemes aim to allow the concurrent execution of compatible operations.
- Locks are granted and released by a lock manager.
- The principal data structure of the lock manager is the lock table.
- In a lock table, an entry consists of a transaction identifier, a granule identifier, and a lock type.
- Two types of locks:
- S locks (Shared or Read Lock Type)
- X locks (Exclusive or Write Lock)
Lock Granularity
- A database is represented as a collection of named data items.
- Granularity can be a field in a record, a record, or a whole disk block.
- A granule is the unit of data individually controlled by the concurrency control subsystem.
- Lock granularity indicates the level of lock use.
Locking Levels
Locking can take place at the following levels:
- Database Level Locking
- Table Level Locking
- Page Level Locking
- Row Level Locking
- Attribute (or Field) Level Locking
Lock Types
DBMS mainly uses the following types of locking technique:
- Binary Locking
- Exclusive Locking
- Shared Locking
- Two-Phase Locking
- Three-Phase Locking
Binary Locking
- There are two states of locking:
- Locked (or 1)
- Unlocked (or 0)
- If an object of a database table, page, tuple, or attribute is locked by a transaction, no other transaction can use that object.
- A distinct lock is associated with each database item.
- If the value of the lock on a data item is 1, the item cannot be accessed by database operations that require them.
- Any database operation requires that the affected objects be locked, and a transaction must unlock the object after termination.
Shared Exclusive (or Read/Write) Locking
- Uses multiple-mode locks.
- Three locking operations:
- Read_lock (A)
- Write_lock (B)
- Unlock (A)
- A read-locked item is also called a shared lock, because other transactions are allowed to read the item.
- A write-locked item is called an exclusive lock, because a single transaction exclusively holds the lock on the item.
- A shared/exclusive lock exists when access is specifically reserved for the transaction that locked the object.
- An exclusive lock is used when a transaction wants to write (update) a data item and no locks are currently held on that data item by any other transaction.
- A shared lock exists when concurrent transactions are granted READ access on the basis of a common lock.
- It produces no conflict as long as the concurrent transactions are Read-Only.
- A shared lock is used when a transaction wants to read data from the database and no exclusive lock is held on that data item.
Two-Phase Locking
- Two-phase locking is a method or a protocol for controlling concurrent processing in which all locking operations precede the first unlocking operation.
- Guarantees serialisibity which means that transaction can be executed in such a way that their results are same as if each transaction’s action were executed in sequence without interruption
- Used to maintain level 3 consistency.
- Defines how transactions acquire and relinquish locks.
- After a transaction has released a lock, it may not obtain any further locks.
- It has the following two phases:
- Growing Phase
- Shrinking Phase
- Rules:
- Two transactions cannot have conflicting locks.
- No unlock operation can precede a lock operation in the same transaction.
- No data are affected until all locks are obtained (transaction is in its locked point).
- Two-phase locking does not prevent deadlocks.
- It is used in conjunction with a deadlock prevention technique.
Deadlocks
- A deadlock is a condition in which two or more transactions in a set are waiting simultaneously for locks held by some other transaction in the set.
- An impasse that may result when two or more transactions are each waiting for locks to be released that are held by the other.
- Also called a circular waiting condition, where two transactions (directly or indirectly) are waiting for each other.
- Two transactions are mutually excluded from accessing the next record required to complete their transaction; also called deadly embrace.
Deadlock Detection & Prevention
- Deadlock detection is a periodic check by the DBMS to determine if the waiting line for some resource exceeds a predetermined limit.
- Detecting cycles rely on wait-for graph
- The frequency of deadlock is primarily dependent on the query load and the physical organization of the database.
- Following three schemes are used to detect and prevent deadlock:
- Never Allow Deadlock (Deadlock Prevention)
- Detect Deadlock whenever a transaction is blocked (Deadlock Detection)
- Detect Deadlock Periodically (Deadlock Avoidance)
- The best deadlock control technique depends on the database environment.
- In the case of a low probability of deadlock, a deadlock detection technique is recommended.
- If the probability of deadlock is high, a deadlock prevention technique is recommended.
- If the response time is not high on the system priority list, the deadlock avoidance technique might be employed.
Time Stamping Methods for Concurrency Control
- A timestamp is a unique identifier created by the DBMS to identify the relative starting time of a transaction.
- Timestamp values are assigned in the order in which transactions are submitted to the system.
- Time stamping is a method of concurrency control in which each transaction is assigned a transaction timestamp.
- A transaction timestamp is a monotonically increasing number, often based on the system clock.
- Timestamps can be generated by incrementing a logical counter every time a new transaction starts.
- Timestamp must have the two properties:
- Uniqueness: assures that no equal timestamp value exists
- Monotonicity: assures that timestamp value always increases
- Read and Write operations of the database within the same transaction must have the same timestamp
- Goal of Time Stamp Order transaction globally in such a way that older transaction get priority in the event of conflict
Granule Timestamp
- It is a record of the timestamp of the last transaction to access it.
- Each granule accessed by an active transaction must have a granule timestamp.
- A separate record of last read and write accesses may be kept.
- Granule timestamp may cause additional Write operation for Read accesses if they are stored with granules.
- Can be avoided by maintaining granule timestamp as an in-memory table.
- Entry consists of granule identifier and transaction timestamp.
Timestamp Ordering
- Total timestamp ordering
- Partial timestamp ordering
- Multiversion timestamp ordering
Conflict Resolution in Timestamps
- Wait-Die:
- The older transaction waits for the younger if the younger has accessed the granule first.
- The younger transaction is aborted and restarted if it tries to access a granule after an older concurrent transaction.
- This strategy ensures that older transactions are given priority over younger transactions, avoiding potential deadlocks.
- Wound-Wait:
- The older transaction preempts the younger by suspending it if the younger transaction tries to access a granule after an older concurrent transaction.
- An older transaction will wait for a younger one to commit if the younger has accessed a granule that both want.
- This strategy allows higher timestamp transactions to "wound" or preempt younger transactions, ensuring progress and avoiding potential deadlocks.
Optimistic Methods for Concurrency Control
- Optimistic method of concurrency control is based on the assumption that the conflicts of the database operations are rare.
- It is better to let a transaction run to completion and only check the conflict before commit.
- Also known as validation or certification method.
- No checking is done while a transaction is executing.
- Each transaction moves through the following phases:
- Read phase
- Validation or certification phase
- Write phase
Validation Phase
- The transaction is validated to assure that the changes made will not affect the integrity and consistency of the database.
- If the test is positive, it moves to the write phase.
- If it is negative, the transaction is restarted and changes are discarded.
- The validation algorithm must check that the transaction has:
- Seen all modifications of transactions committed after it starts.
- Not read granules updated by a transaction committed after its start.
Write Phase
- Changes are permanently applied to the database.
- Updated granules are made public.
- Otherwise, the updates are discarded and the transaction is restarted.