A Replicating Multi Master HashMap
Each remote hash map, mirrors its changes over to another remote hash map, neither hash map is considered the master store of data, each hash map uses timestamps to reconcile changes. We refer to an instance of a remote hash-map as a node. A node will be connected to any number of other nodes, for the first implementation the maximum number of nodes will be fixed. The data that is stored locally in each node will become eventually consistent. So changes made to one node, for example by calling put() will be replicated over to the other node. To achieve a high level of performance and throughput, the call to put() won’t block, with concurrentHashMap, It is typical to check the return code of some methods to obtain the old value for example remove(). Due to the loose coupling and lock free nature of this multi master implementation, this return value will only be the old value on the nodes local data store. In other words the nodes are only concurrent locally. Its worth realising that another node performing exactly the same operation may return a different value. However reconciliation will ensure the maps themselves become eventually consistent.
Reconciliation
If two ( or more nodes ) were to receive a change to their maps for the same key but different values, say by a user of the maps, calling the put(key, value). Then, initially each node will update its local store and each local store will hold a different value, but the aim of multi master replication is to provide eventual consistency across the nodes. So, with multi master when ever a node is changed it will notify the other nodes of its change. We will refer to this notification as an event. The event will hold a timestamp indicating the time the change occurred, it will also hold the state transition, in this case it was a put with a key and value. Eventual consistency is achieved by looking at the timestamp from the remote node, if for a given key, the remote nodes timestamp is newer than the local nodes timestamp, then the event from the remote node will be applied to the local node, otherwise the event will be ignored.
However there is an edge case that we have to concern ourselves with, If two nodes update their map at the same time with different values, we have to deterministically resolve which update wins, because of eventual consistency both nodes should end up locally holding the same data. Although it is rare two remote nodes could receive an update to their maps at exactly the same time for the same key, we have to handle this edge case, its therefore important not to rely on timestamps alone to reconcile the updates. Typically the update with the newest timestamp should win, but in this example both timestamps are the same, and the decision made to one node should be identical to the decision made to the other. We resolve this simple dilemma by using a node identifier, each node will have a unique identifier, the update from the node with the smallest identifier wins.
@param < K> the entries key type
@param < V> the entries value type