Recovery

Recovery algorithms are techniques to ensure database consistency, transaction atomicity, and durability despite failures. Recovery algorithm has two parts: actions during normal transaction, actions after a failure to recover the database.The key primitives are UNDO, the process of removing the effects of an incomplete or aborted transaction. REDO: the process of re-instating the effects of a committed transaction for durability.

Storage can be volatile storage, data will be lost after powering off. Non-Volatile have data persistent after powering off. Stable storage never lose data. It could achieve approximately by using multiple storages.

Failure can be transaction failure: transaction cannot complete due to some internal error and DBMS must terminate an active transaction due to an error condition. System failure is DBMS fails and system is crashed, non-volatile storage are not corrupted. Storage media failure is a disk failure and destroy parts of non-volatile storage.

Buffer pool management steal policies will decide whether the DBMS allow an uncommitted transaction to overwrite the most recent committed value of an object in non-volatile storage. No-steal policy will not write uncommitted transaction value back to disk, steal policy, allows the system to write modified blocks to disk even if the transactions that made those modifications have not all committed, could steal other transaction's memory. Force policy ensures that all updates made by a transaction are reflected on non-volatile storage before the transaction is allowed to commit. No-force is not enforced to do this. NO-STEAL + FORCE means no redo: all committed transactions' changes are reflected in disk, no undo: all aborted transactions' changes are not written to disk. Limitation is memory because of no-steal policy.

Shadow paging means updates are only made in the shadow copy. When a transaction commits, atomically switch the shadow to become the new master. Disadvantages: Copying the entire page table is expensive and the commit overhead is high. Organize the database pages in a tree structure where the root is a single disk page. The root points to the master copy, updates are applied to the shadow copy. To install updates, overwrite the root so it points to the shadow, thereby swapping the master and shadow. For undo, it remove the shadow pages. Leave master and the DB root pointer alone. Do not need redo.

Write-Ahead Logging means DBMS records all changes made to the db in log file before changes is made to a disk page. The log contains information to perform undo and redo. It has fast runtime performance but slow recovery time. Log records are written to disk before update is allowed to be written on disk. Transaction is committed until all its log records have been written to stable storage.Write BEGIN in the beginning, COMMITTED to make sure all log records are flushed. Log records contains tid, object id, before value(UNDO), after value(REDO). If we use NO-STEAL policy, we don't need original value, but in that way we could not undo for aborted transaction.

DBMS can periodically takes a checkpoint where it flushes all buffers out to disk. The DBMS stops accepting new transactions and waits for all active transactions to complete. Flush all log records and dirty blocks currently residing in main memory to stable storage. Write a entry to the log and flush to stable storage.

Logging schemes could be physical logging: record the changes made to a specific location in the database. Logical logging records the high operations executed by transactions. Physiological logging means log records target a single page but do not specify data organization of the page.

ARIES

Algorithms for Recovery and Isolation Exploiting Semantics.

WAL
Repeat history in redo
logging changes during undo

In WAL, each log record has a global unique log sequence number. Each data page contains a pageLSN, the LSN of the most recent update to that page. prevLSN: The previous LSN for the transaction. System keeps track of flushedLSN: the max LSN flushed so far. Before page i can be written to disk, we must flush log at least to the point where $pageLSN_i$ ≤ flushedLSN.

Transaction commit will first write COMMIT record to log. All log records to to transaction’s COMMIT record are flushed to disk. When the commit succeeds, write a special TXN-END record to log.

Transaction abort just an UNDO operation. Use prevLSN to undo transaction. Compensation Log Record, CLR describes the actions taken to undo the actions of a previous update record. It has all the fields of an update log record plus the undoNext pointer. CLRs are added to the log like any other record but they never need to be undone.

First write ABORT record to log.
Then play back updates in reverse order to remove their effects. For each update, write a CLR entry and restore old value.
At end, write a TXN-END log record.

Blocking checkpoints will halt the start of any new transactions and wait until all active transactions finish executing, flush dirty pages on disk.

Better one will halt new transaction and just pause transactions while the DBMS takes the checkpoint. It uses Active Transaction Table (ATT) to record active transaction and their lastLSN(most recent lsn written by transaction). Entry is removed when transaction commits or aborts. In Dirty Page Table (DPT), it keeps track of pages in the buffer pool contain changes from uncommitted transactions. And there is recLSN field, the LSN of the log record that first caused the page to be dirty.

Fuzzy checkpoints allows other transactions to continue to run. Add new log records to track checkpoint boundaries, : Indicates the start of the checkpoint, : Contains the ATT + DPT.

There are three phases in ARIES.

Analysis: Read the WAL to identify dirty pages in the buffer pool and active transactions at the time of the crash.

Scan log forward from the checkpoint.
If you find a TXN-END record, remove its transaction from ATT.
All other records, add transaction to ATT with status UNDO, and on commit, change transaction status to COMMIT.
For UPDATE records, if page P not in DPT, add P to DPT and set its recLSN=LSN

REDO: repeat history to reconstruct state at the moment of the crash. Reapply all updates (even aborted transactions) and redo CLRs:

Scan forward from log record containing smallest recLSN in PDT.
For each update log record or CLR with a given LSN, redo the action unless: – Affected page is not in the DPT, or – Affected page is in DPT but that record’s LSN is greater than smallest recLSN, or – Affected pageLSN (on disk) ≥ LSN.
To redo an action: – Reapply logged action. – Set pageLSN to log records LSN. – At the end of the redo phase, write TXN-END log records for all transactions with status “C” and remove them from the ATT.

Undo Phase:

Undo All transactions active at the time of crash
These are all transactions with “U” status in the ATT after the Analysis phase
Process them in reverse LSN order using the lastLSN to speed up traversal
Write a CLR for every modification

Aries