ZooKeeper-based implementation of a CyclicBarrier.
Wherever possible, this implementation is consistent with the semantics of {@link java.util.concurrent.CyclicBarrier}, and adheres to as many of the same constraints as are reasonable to expect.
A CyclicBarrier causes all threads which call the {@code await()} methods to wait for the barrier to be completedacross all participating nodes. The cyclic nature allows the same instance to be repeatedly used, by calling the {@link #reset()} methods.
Node Failure Considerations
Unlike concurrent {@link java.util.concurrent.CyclicBarrier} implementations, this implementation is sensitiveto node failure. In a concurrent world, it is possible to guarantee that all parties will enter the barrier or will break the barrier, regardless of what else may occur. In a distributed world, external system events such as network partitions and node failures can cause parties to leave a barrier prematurely and uncleanly.
To account for these scenarios, this implementation allows the caller to choose whether or not to tolerate external system failures.
If the choice is made to be tolerant to external system failures, then barrier entrance is considered permanent; even if the party subsequently (in the ZooKeeper ordering) fails, all other members of the barrier (including members yet to enter) will see that party as entered. This allows the still living members of the barrier to proceed, but may result in the barrier being considered completed when it should not have been.
If the choice is made to be intolerant to external system failures, then barrier entrance is considered conditional; if a party enters the barrier, then a system failure event occurs, another member will notice the failure and break the barrier. However, this conditional failure is contingent upon there being another barrier member present when the party fails. This leaves the possibility of a deadlock scenario; if a party enters the barrier and then fails (Zookeeper session timeout) before any other party can enter the barrier, then subsequent parties will not notice the existence and failure of that party, leading the count of parties to always be less than what is necessary to proceed. This scenario requires that the timing between members entering the barrier to be relatively high--there must be at least enough time between the first party and the second party entering for a ZooKeeper session to timeout. If this occurs regularly, then consider increasing the ZooKeeper timeout period for ZooKeeper clients.
If even a small risk of deadlocks are unacceptable, and false barrier-completions are acceptable, then instances of this class should call {@link #ZkCyclicBarrier(long, org.menagerie.ZkSessionManager, String, java.util.List, boolean}) with {@code tolerateFailures = true}. If a small risk of deadlocks is acceptable, or the ZooKeeper Session timeout is guaranteed to be long enough that the deadlock risk is not present, then instances may call any of the default constructors, or call {@link #ZkCyclicBarrier(long,org.menagerie.ZkSessionManager,String,java.util.List,boolean)}with {@code tolerateFailures = false}.
@author Scott Fines
@version 1.0
@see java.util.concurrent.CyclicBarrier