Consensus Protocols
Motivations, Concepts and Principles
graph LR A[Fault Tolerance] --> B[Availability] A --> C[Reliability] A --> D[Safety] A --> E[Maintainability]
| Term | Content |
|---|---|
| Availability | Operating correctly at any given moment; Available to perform functions. |
| Reliability | System can run continuously without interruption or failure for long time. |
| Safety | When a system temporarily fails, no catastrophic event happens. |
| Maintainability | How easily a failed system can be repaired, automatically, if possible |
Goal: Robustness
- Fault 的特点
- Transient
- Intermittent
- Permanent
| Type of Failure | Server Behaviour |
|---|
Synchrony Assumptions
对于分布式系统而言,为什么我们需要不同的 Communication Models (aka. Synchrony Assumptions)?
- 如何描述一种通信模型?
- Communication uncertainty is captured by an adversary that controls the
message delay in the network under the assumed communication model - Keypoint: Communication model defines the limit of the power of such an adversary
- Communication uncertainty is captured by an adversary that controls the
Synchrony Model
Asynchrony Model
Partial Synchrony Model
Consensus
Consensus Problem
Conditions of Consensus
- Agreement: No two honest nodes decide on different values at the end.
- 否则说明 Bad Node 污染了 Honest Nodes,证明 Protocol 不够 Fault-Tolerance 或者设计有缺陷.
- 没有一致性,系统的正确性就没意义(e.g. 用户 A 看到“转账成功”,用户 B 看到“转账失败”)
- Validity: If all honest nodes have input
, then must be the decision value. - 防止 Protocol 总是输出 trivial、无意义的输出
- 没有有效性,协议可能给出“伪造”的结果,与输入完全无关(即使大家都说 0,它输出 1)。这会破坏系统的公平性和合理性
- Termination: Honest nodes must eventually decide on a value in
and halt. - 没有终止性,系统可能一直等下去不决定(比如所有人卡死在等待别人消息),这就失去了“可用性”
Paxos
Raft
PBFT and Byzantine General Problem
Byzantine General Problem
所有人最终必须得出一个决定. 此外存在
答案:
证明
CAP Theorem
When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three.
- Consistency: shared, replicated data appears as a single, up-to-date copy
- Availability: updates will always be eventually executed on the system
- Partition tolerance: the system will be tolerant to the partitioning of the process group due to failed network
FLP Impossibility
Any protocol