WEB LECTURE 17 Database Indexing, Transactions, and Concurrency Control

Indexing

Indices are crucial in Database Management Systems (DBMS) for significantly speeding up database operations. An index is a data structure that facilitates fast lookups of records, typically created as a separate file. Indices can be sparse or dense. A sparse index contains index records for only some search key values, while a dense index, which is more common, contains an index record for every search key value in the file.

Primary Keys are indexed by default in most DBMS to speed up joins. Indices are created using the CREATE INDEX statement in SQL. For example:

CREATE INDEX c_name_idx ON customer (name);

Transactions

A transaction is an action or series of actions performed by a user or application to access or change the contents of a database. A database must be in a consistent state before and after a transaction. Transactions are essential due to multiple concurrent accesses.

Transactions are logical, indivisible units of processing. An example:

Transferring a sum between two bank accounts:

START TRANSACTION;
  -- debit account1 - read, update, write
  -- credit account2 - read, update, write
COMMIT;

If a second process reads/writes the same data items during the transaction, problems may arise. By default, the MySQL system issues an automatic commit after each command unless you specify START TRANSACTION. ROLLBACK reverses all actions up to the last start of the transaction.

Quick Quiz 1

Consider these scenarios:

  • A lecturer updates the marks of one student, then changes the study status of another student to “intermitting,” and finally commits the transaction.
  • The maximum marks on a module must add up to 100. A lecturer needs to change assessment 1 from 50 to 40 and assessment 2 from 50 to 60. The lecturer first changes assessment 1 to 40, then commits before changing assessment 2 to 60 and committing again.
  • The deadline on several assessments for a single student has been extended by a week. The lecturer makes the change to all assessments, then commits.

Concurrent DB Access

Problems that can occur during concurrent database access include:

  • Dirty Read
  • Lost update (Dirty Write)
  • Inconsistent Analysis
  • Out of date or inconsistent reads

Read of uncommitted data (dirty read)

If a transaction is aborted, its intermediate state should not be visible to other transactions.

T1: UPDATE t SET x=x+100 WHERE id = 01;
ROLLBACK;

T2: SELECT * FROM t WHERE id = 01;

Lost update (dirty write)

Occurs when two transactions update the same item.

T1: UPDATE t SET x=x+100 WHERE id = 01;
T2: UPDATE t SET x=x-50 WHERE id = 01;

Inconsistent analysis

Can occur when one transaction is performing an aggregate operation while another is updating data.

T1: SELECT SUM(x) FROM t;
T2: UPDATE t SET x=x+50 WHERE id = 01;

Serializability

Two transactions are serializable if, when interleaved, they produce the same result as if they were executed in a serial manner and do not interfere with each other. A good concurrency control system attempts to schedule transactions so that concurrent transactions are serializable. Schedules that are not serializable can generate inconsistent results.

Desirable properties of transactions (ACID)

  • Atomic: The set of actions that form a transaction are indivisible; the transaction is either entirely completed or completely aborted.
  • Consistent: The transaction always results in a consistent database state after the operation if it was in a consistent state before the transaction.
  • Isolated: Transactions must be independent of each other (serializable). Partial effects must not be visible to other transactions.
  • Durable: The effects of a committed transaction must be permanent.

Quick Quiz 2

Which of the ACID properties do each of the following break?

  • A user sets a primary key to be null
  • A transaction completes, making an update to a table. However, the update is not retained in the database because of hardware failure followed by reversion to a backup
  • A transaction accesses an update made by a second transaction before the second transaction commits
  • A transaction makes a change to the database, system failure then prevents the transaction from committing, leaving the change in the database permanently stored

Note that none of these should happen in a well-run DBMS!

Concurrency control techniques

  • Pessimistic methods
    • Locking: Widely used; data items are ‘locked’ during access to deny access to other transactions.
    • Timestamping: Transactions are ordered by start time. If a conflict occurs, the latest is aborted. Does not scale well for Internet-scale problems.
  • Optimistic methods
    • No checks are performed until the transaction is completed. Differences are resolved when detected.

MySQL uses locking and has varied granularity.

  • InnoDB has row-level locking (default storage engine).
  • MyISAM, MEMORY, and MERGE have table-level locking.

Locking

A lock manager maintains a list of locks. Before each granule may be accessed, a lock is requested:

  • Read lock – shared lock
    • If T1 has a read lock on granule x, T2 may be granted a read lock on x.
    • If T1 has a read lock on granule x, T2 may NOT be granted a write lock on x.
  • Write lock – exclusive lock
    • If T1 has a write lock on granule x, T2 may NOT be granted either a read lock or a write lock on x.

Quick Quiz 3

Show how locking prevents the previous identified issues:

  • Dirty read (slide 7)
  • Dirty write (slide 8)
  • Inconsistent analysis (slide 9)

Problems with Locking

  • Deadlock: A circular waiting condition is caused by two or more transactions, each waiting for another to release its lock on a granule.
    • Deadlock Solutions
      • Deadlock prevention (expensive)
      • Deadlock detection followed by rollback and restarting of one transaction (manageable). This is the typical approach in current commercial DBMS.
  • Livelock: A transaction waits indefinitely for a lock despite the state of the transactions changing.

Deadlock

Two transactions need to access the same resource, and neither transaction can complete.

T1: read x; set x=x+100; read y; …
T2: read y; set y=y-1, read x; …

Summary

  • Indexing speeds up queries.
  • Transactions ensure data consistency.
  • Concurrent database access requires control mechanisms.
  • Locking is commonly applied to avoid inconsistencies.