Transaction
A transaction is a sequence of operations that are carried out together and form a single unit. It must be either end up completed or is completely undone.
Transactions (TX) should have 4 properties, ACID principle.
- Atomic. The transaction is never half-done.
- Consistent. TX changes DB from one consistent state into another consistent state.
- Isolation. Data updates within a TX should not be visible to other TXs until this TX is completed.
- Durable. When a TX is done, it really is done and the updates will not disappear.
In the context of distributed systems, completing a TX may involve multiple servers. To ensure atomicity, we have to manage the behaviour of the servers (e.g. rollback on failure), and thus we use a 2 phase commit mechanism.
2 Phase Commit
A 2 phase commit procedure requires 1 coordinator (manager) and several participants (cohorts). The schema is divided into 2 phases:
the “commit” here means submit the executed, which is different from “execution”.
-
Commit-Request (Voting) Phase. In this commit-request phase,
- coordinator asks all participants if they have successfully executed and ready for commit. After sending requests to participants, coordinator waits responses.
- participants execute operations issued before the request, and record undo/redo information into logs.
- if a participant successfully finish execution, it responds a “OK” to coordinator
- otherwise, it responds a “Not OK”.
-
Commit Phase. The coordinator gets all response from participants.
- If all responses are “OK”, then
- the coordinator sends “Commit” to all participants
- upon receiving “Commit” from coordinator, each participant formally finalize operation and release resources. Then reply with “TX Complete”.
- After receiving “TX Complete” from all participants, coordinator finalizes the transaction.
- If not all responses are “OK”, this means some participants fail to finish operation. We have to rollback.
- the coordinator sends “Abort” to all participants.
- upon receiving “Abort”, each participant rolls back with previously recorded undo log, and release resources. Then sends “Rollback Complete” to coordinator.
- After receiving “Rollback Complete” from all participants, coordinator finalizes the cancellation.
- If all responses are “OK”, then
Error Handling
-
If a participant failed before voting, the leader just needs to abort the procedure.
For the participant to recover, it’ll needs to find out what happened. If the participant just forgets the transaction, the overall effect is like “it rolls back the transaction”, so no problem. But if the participant knows the transaction and wants to know the outcome, we have to keep a long log of outcome, so the participant can look up. The coordinator is responsible for maintaining this log.
-
What if a participant responds with “OK” but fails? In this case, the participant won’t receive “Commit/Abort” from coordinator, but since it has prepared the transaction, it can find the outcome in the log.
Implication
Coordinator must log the outcome before sending the “Commit/Abort” outcome message.
- If any participant fails during commit-request phase, the coordinator will timeout and abort.
- If coordinator fails during commit-request phase, participants will time out and mark the transaction aborted.
- If coordinator fails during commit phase, participants will timeout with locks held in hand. In this case, heuristic abort can be used.
