Concurrency control is a fundamental mechanism in database management systems that allows multiple transactions to execute simultaneously without interfering with each other, thereby ensuring the isolation and consistency properties of the ACID model. In a multi-user environment, uncontrolled concurrent access can lead to severe problems like lost updates, dirty reads, and inconsistent analysis. The primary goal of concurrency control is to produce a schedule of concurrent transaction executions that is serializable, meaning its outcome is equivalent to running the transactions one after another in some sequential order. Techniques to achieve this are broadly classified into two categories: pessimistic (e.g., Locking) and optimistic (e.g., Timestamping), each with distinct strategies for managing access to shared data.
Need of Concurrency control:
-
To Prevent Lost Update Problems
A lost update occurs when two transactions read the same data and then update it based on the initial value, causing one transaction’s update to be overwritten and lost. For example, if two clerks concurrently update a product’s quantity in stock, the second update will blindly overwrite the first, making it appear as if the first sale never happened. Concurrency control is needed to serialize such write operations, ensuring that all updates are applied sequentially and accurately reflect the total business activity, thereby maintaining the correctness of the data.
-
To Ensure Data Consistency and Correctness
Concurrency control is fundamental to preserving logical data consistency across multiple related data items. Without it, a transaction might read a database in an inconsistent state. A classic example is a funds transfer where one transaction reads the account balances after the withdrawal but before the deposit. This temporary inconsistency, if exposed, leads to incorrect reporting and decision-making. Concurrency control mechanisms enforce isolation, ensuring that transactions see the database only in consistent states, either before or after other transactions, but never during their intermediate, inconsistent phases.
-
To Resolve Read-Write and Write-Read Conflicts
Conflicts arise when one transaction reads data that another is writing, or vice-versa. A dirty read happens when a transaction reads uncommitted data from another transaction that may later roll back. An unrepeatable read occurs when a transaction gets different values on successive reads of the same item because another transaction modified it in between. Concurrency control is needed to manage these conflicts by locking data or using timestamps to control visibility, guaranteeing that reads are predictable and are not contaminated by uncommitted or intermediate changes.
-
To Maximize System Throughput and Performance
While it may seem that serial execution is safest, it leads to poor resource utilization and low throughput. The CPU and disk would sit idle while a transaction is waiting for user input or performing a slow I/O operation. Concurrency control is needed to allow multiple transactions to execute simultaneously, interleaving their operations to keep the system resources busy. This parallel execution dramatically increases the number of transactions processed per second and reduces average response time, which is essential for supporting a large number of users in high-performance OLTP systems.
-
To Enable High Availability and User Productivity
In a multi-user environment like a banking, reservation, or e-commerce system, data must be available for access and update around the clock. If users were forced to wait in a queue for their turn to access the database, productivity would plummet. Concurrency control is needed to provide the illusion that each user is the only one operating on the system, all while safely managing the simultaneous access in the background. This ensures high availability, a responsive user experience, and supports the collaborative, real-time nature of modern business operations.
Lock-Based Protocols
Lock-based protocols are a pessimistic approach to concurrency control. They require a transaction to acquire a lock on a data item before it can read or write it, preventing other transactions from modifying the item simultaneously. There are two primary lock modes: Shared (S) locks for read operations, which allow multiple transactions to read the same item, and Exclusive (X) locks for write operations, which prevent any other lock. The protocol’s core is defined by its rules for lock acquisition and release. A common strict Two-Phase Locking (2PL) protocol guarantees serializability by growing (acquiring locks) and shrinking (releasing locks) in two distinct phases, with all locks held until the transaction commits, ensuring recoverability and preventing cascading rollbacks.
Timestamp-Based Protocols
Timestamp-based protocols are an optimistic alternative to locking. Each transaction is assigned a unique, monotonically increasing timestamp upon start. The protocol orders transactions based on their timestamps to ensure serializability. For every data item, the system maintains the read-timestamp (latest transaction that read it) and the write-timestamp (latest transaction that wrote it). When a transaction issues a read or write, the protocol compares its timestamp with the item’s timestamps. If a transaction attempts to read a value written by a younger transaction or write a value read by a younger transaction, it is rolled back and restarted with a new timestamp. This method avoids locks and deadlocks but incurs overhead from restarts.
Deadlocks
A deadlock is a specific concurrency problem where two or more transactions are stuck in a circular wait, each holding a lock that another transaction needs and simultaneously waiting for a lock held by another transaction in the circle. For example, Transaction T1 holds lock on item A and waits for item B, while Transaction T2 holds lock on B and waits for A. Neither can proceed. Deadlock handling involves two strategies: prevention and detection & recovery. Prevention protocols (e.g., wait-die, wound-wait) use transaction timestamps to avoid circular waits. Detection uses a wait-for graph to identify cycles; if a cycle is found, the system selects a victim transaction to roll back, breaking the deadlock.
