Testing of serializability, Serializability of schedules
The Serializability of a schedule is tested using a Serialization graph.
Assume a schedule S1. For S1, a graph called Precedence Graph is constructed. This graph consists of a pair G = (V, E), where E is a set of the edges, and V is a set of all vertices. All the transactions participating in the schedule are stored in the vertices. The set of edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:
Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
Schedule S Precedence Graph
If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed before the first instruction of Tj is executed.
If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence graph has no cycle, then S is known as serializable.
Explanation
Read(A): In T1, no subsequent writes to A, so no new edges
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
Precedence Graph for S1
This graph contains a cycle which is why it is not serializable.
Explanation
Read(A): In T4, no subsequent writes to A, so no new edges
Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5, no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
Precedence Graph S2
S2 is not cyclic, which is why it is serializable.
Serializable Schedule:
A transaction schedule is serializable if its outcome is equal to the outcome of its transactions executed serially i.e. sequentially without overlapping in time. A serializable schedule always leaves the database in consistent state. A serial schedule is always a serializable schedule because a new transaction only starts when the older one has finished execution.
Conflict & view serializable schedule
Conflict Serializability |
View Serializability |
Two schedules are said to be conflict equivalent if all the conflicting operations in both the schedule get executed in the same order. If a schedule is a conflict equivalent to its serial schedule then it is called Conflict Serializable Schedule. |
Two schedules are said to be view equivalent if the order of initial read, final write and update operations is the same in both the schedules. If a schedule is view equivalent to its serial schedule then it is called View Serializable Schedule. |
If a schedule is view serializable then it may or may not be conflict serializable. | If a schedule is conflict serializable then it is also view serializable schedule. |
Conflict equivalence can be easily achieved by reordering the operations of two transactions therefore, Conflict Serializability is easy to achieve. | View equivalence is rather difficult to achieve as both transactions should perform similar actions in a similar manner. Thus, View Serializability is difficult to achieve. |
For a transaction T1 writing a value A that no one else reads but later some other transactions say T2 write its own value of A, W(A) cannot be placed under positions where it is never read. | If a transaction T1 writes a value A that no other transaction reads (because later some other transactions say T2 writes its own value of A) W(A) can be placed in positions of the schedule where it is never read. |
Recoverability
Non-Serializable Schedules:
A non-serial schedule which is not serializable is called as a non-serializable schedule.
A non-serializable schedule is not guaranteed to produce the the same effect as produced by some serial schedule on any consistent database.
Characteristics:
Non-serializable schedules:
- May or may not be consistent
- May or may not be recoverable
Irrecoverable Schedules:
If in a schedule,
- A transaction performs a dirty read operation from an uncommitted transaction
- And commits before the transaction from which it has read the value then such a schedule is known as an Irrecoverable Schedule.
Recoverable Schedules:
If in a schedule,
- A transaction performs a dirty read operation from an uncommitted transaction
- And its commit operation is delayed till the uncommitted transaction either commits or roll backs then such a schedule is known as a Recoverable Schedule.
Here,
- The commit operation of the transaction that performs the dirty read is delayed.
- This ensures that it still has a chance to recover if the uncommitted transaction fails later.
Backup
Database backup is the process of backing up the operational state, architecture and stored data of database software. It enables the creation of a duplicate instance or copy of a database in case the primary database crashes, is corrupted or is lost.
A volatile storage like RAM stores all the active logs, disk buffers, and related data. In addition, it stores all the transactions that are being currently executed. What happens if such a volatile storage crashes abruptly? It would obviously take away all the logs and active copies of the database. It makes recovery almost impossible, as everything that is required to recover the data is lost.
Following techniques may be adopted in case of loss of volatile storage:
- We can have checkpoints at multiple stages so as to save the contents of the database periodically.
- A state of active database in the volatile memory can be periodically dumped onto a stable storage, which may also contain logs and active transactions and buffer blocks.
- <dump> can be marked on a log file, whenever the database contents are dumped from a non-volatile memory to a stable one.
Recovery
- When the system recovers from a failure, it can restore the latest dump.
- It can maintain a redo-list and an undo-list as checkpoints.
- It can recover the system by consulting undo-redo lists to restore the state of all transactions up to the last checkpoint.
Recovery from transaction failures
A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the storage device, all the valuable data that is stored inside is lost. We have two different strategies to recover data from such a catastrophic failure:
- Remote backup: Here a backup copy of the database is stored at a remote location from where it can be restored in case of a catastrophe.
- Alternatively, database backups can be taken on magnetic tapes and stored at a safer place. This backup can later be transferred onto a freshly installed database to bring it to the point of backup.
Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques where we can restore a database just by looking at its logs. So, all that we need to do here is to take a backup of all the logs at frequent intervals of time. The database can be backed up once a week, and the logs being very small can be backed up every day or as frequently as possible.
Remote Backup
Remote backup provides a sense of security in case the primary location where the database is located gets destroyed. Remote backup can be offline or real-time or online. In case it is offline, it is maintained manually.
Log based recovery
- The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if any failure occurs, then it can be recovered from there.
- If any operation is performed on the database, then it will be recorded in the log.
- But the process of storing the logs should be done before the actual transaction is applied in the database.
Types:
Deferred database modification:
- The deferred modification technique occurs if the transaction does not modify the database until it has committed.
- In this method, all the logs are created and stored in the stable storage, and the database is updated when a transaction commits.
Immediate database modification:
- The Immediate modification technique occurs if database modification occurs while the transaction is still active.
- In this technique, the database is modified immediately after every operation. It follows an actual database modification.
Checkpoints
The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed. During transaction execution, such checkpoints are traced. After execution, transaction log files will be created.
Upon reaching the Savepoint / Checkpoint, the log file is destroyed by saving its update to the database. Then a new log is created with upcoming execution operations of the transaction and it will be updated until the next checkpoint and the process continues.
- The checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk.
- The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the transaction is executed then using the steps of the transaction, the log files will be created.
- When it reaches to the checkpoint, then the transaction will be updated into the database, and till that point, the entire log file will be removed from the file. Then the log file is updated with the new step of transaction till next checkpoint and so on.
- The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed.
Advantages of using Checkpoints:
- It speeds up data recovery process.
- Most of the dbms products automatically checkpoints themselves.
- Checkpoint records in log file is used to prevent unnecessary redo operations.
- Since dirty pages are flushed out continuously in the background, it has very low overhead and can be done frequently.