ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Atomic File Transactions, Part 2
Pages: 1, 2, 3

Appendix: Correctness of the Algorithm

Here I provide detailed justifications of the correctness (with respect to crashes) of my atomicity algorithm. For convenience, the steps of the algorithm are repeated.



Execution

E1. Write the action's name, its arguments (e.g. the file to delete) and any other necessary undo information (e.g. the name of the backup file) into the journal.

E2. Flush the journal's contents to disk.

E3. Write any backup files needed to undo the action.

E4. Perform the action.

E5. If there is an error performing the action, erase its entry from the journal.

Crash points

During E1 or E2. If a crash occurs during the first two steps, the action's entry may be partially written to the journal. Recovery will detect that the final entry is not complete and will ignore it.

After E2 and before E4. Backup files have been created, but the action never occurred. Recovery will detect this (see the recovery section below) and will delete the backup.

During E4. Each filesystem action supported by the package is either atomic (in which case we need not worry about a crash during the action), or it is possible to determine from the state of the filesystem whether the action began or not, and to undo its partial effects.

After E4 and before E5. Recovery takes into account that the last action in the journal may not have succeeded.

During E5. Erasing the action from the journal is atomic, on the assumption that setting the length of a RandomAccessFile is atomic.

Commit

C1. Write in the journal that the transaction has committed.

C2. Delete the backup files for the transaction.

C3. Delete the journal.

Crash points

Before or during C1. If a crash occurs previous to or while writing the commit marker, the commit fails. Recovery will notice the existence of the journal file, see that there is either no commit marker or a partial one at the end of the file, and roll back the transaction.

After C1 and before C3. Some backup files may not be deleted. The recovery process, noticing that the transaction is committed, will finish the job, reading the journal to find the locations of the backups.

During C3. Can't happen -- we assume delete is atomic.

The Local Undo Property

Before looking at the correctness of recovery/rollback in the face of crashes, we must discuss a requirement on actions that is central to the correctness argument. The requirement states that for each action, it must be possible to restore the system to the state before the action was attempted, even if the action failed to occur, partially occurred, or was already undone. This requirement is implicit in the first part of step R1.1: "If the action needs to be undone...". Detecting whether an action needs to be undone is difficult to achieve, in general, but the algorithm is designed so that the detecting code can assume that no other actions on the relevant files have intervened. Let's call this requirement on actions the local undo property.

We need the local undo property for two reasons. First, after a crash, it is possible that the last entry recorded in the journal represents an action that never happened. That could occur if the system crashed between step E4, when the journal entry was made, and E5, when the action was executed. In that case, recovery should not undo the last action. Since the journal entry claims the action happened, we must look to the filesystem to see whether it really did or not. For instance, to test whether a delete of file F occurred, we could see if F existed or not. If it did, we would not attempt to restore it from backup.

The second reason that we need the local undo property concerns crashes that occur during rollback itself. The subsequent restart and recovery may attempt to undo an action a second time. If the undo operation for an action is idempotent -- that is, it has the same effect whether it happens once or many times -- then repeated undo attempts aren't a problem. But typical undo operations are not idempotent. For example, the way to undo a file creation is to delete the file. Attempting the delete a second time after it succeeded the first time would result in an error, since the file doesn't exist anymore. Making delete idempotent would seem to be easy -- just test for the presence of the file before deleting it. Unfortunately, achieving true idempotence is difficult by simply looking at the state of the filesystem (i.e., which files exist), because such information is ambiguous. An example should help make this clear.

Consider an action that creates a file named F. The corresponding undo operation is deleting F, if it exists. This would seem to be idempotent, since it always leaves the filesystem in the same state: F does not exist. The problem is that it may delete the wrong file F.

To see this, assume rollback did not record undone actions (i.e., step R1.2 were not present). Say that initially there is a file named F, and that the transaction

Delete F
Create F

is carried out, but fails before commit. When recovery runs, it rolls back the transaction by first deleting F -- that is, the undo operation for "Create F" -- and then restoring the original F from a backup via renaming, thereby undoing the "Delete F" action. Recovery then crashes. At this point, a file named F exists. When recovery is run again, "Create F" will be undone again, deleting F. We have just deleted the original file F present before the transaction, a grave error. The backup is gone, so we cannot restore from it.

One fix would be to restore the backup by copying rather than renaming. But similar problems arise with other undo operations, where there is no obvious backup to copy.

For instance, the obvious way to undo a rename of F to G is to rename G back to F, provided G exists and F does not. But consider the transaction

Create F
Delete G
Rename F to G

in an initial state where F does not exist but G does. If the transaction crashes just before commit, then recovery will rename G to F, restore the backup of the original G, and finally delete F. If this first recovery attempt then crashes before deleting the journal, then the filesystem contains G but not F, so the next recovery attempt begins by renaming G to F. But as before, this G is the original G. When it is deleted as a result of undoing the Create F action, the original file will be lost.

These examples demonstrate that there is no obvious way to make undo operations idempotent by looking only at the existence of files; the problem is that there may be different versions of the files. The best solution is to record in the journal the fact that the undo action took place, so it will not be attempted in the event of a crash and restart. The only time an undo operation will be executed more than once is if the crash occurs after the operation happens but before the undo marker is written into the journal. But in that case, no other operations will intervene, so testing the existence of files will give the right results. This is exactly what I have called the local undo property: an action's undo operation must be able to detect whether the action needs to be undone from the state of the filesystem, provided no other actions on the relevant files have occurred in between.

Rollback

R1. For each journal entry that has not already been undone, in reverse order:

R1.1 If the action needs to be undone, undo it.

R1.2 Record in the journal that the action has been undone.

R2. Delete the backup files for the transaction.

R3. Delete the journal.

Before R1. In this case, the system crashed before rollback began -- in other words, while the transaction was in progress. Recovery, failing to see a commit marker at the end of the journal, will call rollback to undo the transaction's partial effects. There is one tricky point: if the crash happened after an action failed but before its record could be erased from the journal (that is, between steps E4 and E5), then the last journal entry represents a failed action, perhaps one that partially completed. Since recovery is the first thing that should happen after a crash, there will have been no intervening activity, so the local undo property will guarantee that the last journal entry is treated appropriately.

During R1.1. If the undo operation is atomic, then this can't happen. Otherwise, the local undo assumption says that the original state of the system can be recovered, even if the action was partial. A crash during undo will look like a partially-completed action, so it will be correctly recovered when the system restarts. The local undo property assumes that the files aren't touched between the crash and the subsequent recovery attempt. This is achieved by using the undo markers written in step R1.2. When recovery runs after a crash, it will use the undo markers to avoid any attempt to undo actions that have already been undone.

Between R1.1 and R1.2. The action has been undone, but that fact is not recorded in the journal. Recovery will call rollback, which will start again with this action. The local undo assumption covers this case. No intervening activity has occurred since the crash, so the system will detect that the action has been undone -- this is the same as detecting that the action has not occurred -- and will not attempt to undo it again.

During R1.2. Writing the last byte of the undo marker is atomic. (See the discussion above on crashing before or during C1.)

After R1.2 and before R1.1. Since these steps are in a loop, we must consider a crash after R1.2 runs, but before the next iteration begins. In this case, recovery will cause rollback to resume exactly where it left off, because there is an undo marker in the journal for each undone action.

After the last R1.2 and before R3. Since the fact that every action is undone is recorded in the journal, the next time rollback runs it will proceed immediately to step R2, picking up where it left off, and then to R3.

During R3. Deleting the journal is atomic.


Return to ONJava.com.