link

February 28, Wednesday
12:00 – 14:00

Self-Stabilizing Distributed Storage and Self-Stabilizing Consensus
Computer Science seminar
Lecturer : Mr. Ronen Kat
Lecturer homepage : http://www.cs.bgu.ac.il/~kat/
Affiliation : CS, BGU
Location : 202/37
Host : Dr. Michael Elkin
The talk summarized my Ph.d thesis and presents various distributed storage solutions. From distributed file systems to long-term large scale peer-to-peer storage networks. I will focus on how to achieve asynchronous consensus among processors in presence of crashes (using a failure detector).

Self-stabilizing algorithms can cope with transient faults. Transient faults can alter the system state to an arbitrary state and hence, cause a temporary violation of the algorithm correctness. Our algorithms can be started in an arbitrary state. Thus, can converge to their designed behavior. The talk will focus on a self-stabilizing failure detector, asynchronous consensus and replicated state-machine algorithm suite. We define new requirements for consensus that fit the on-going nature of self-stabilizing algorithms. The wait-free consensus (and the replicated state-machine) algorithm is a classic combination of a failure detector and a (memory bounded) rotating coordinator consensus that satisfy both eventual safety and eventual liveness. Several new techniques and paradigms are introduced. The bounded memory failure detector abstracts away synchronization assumptions using bounded heartbeat counters combined with a balance-unbalance mechanism. The practically infinite paradigm is introduced in the scope of self-stabilization, where an execution of, say, 2^64 sequential steps is regarded as (practically) infinite. Finally, we present the first self-stabilizing wait-free reset mechanism that ensures eventual safety and can be used in other scopes.