A discussion of the recent work to transition Cassandra from its naive 1-partition-per-node distribution, to a proper virtual nodes implementation.
A content-addressable system works by creating a deterministic mapping between unique key and a name-space partition. How these partitions are distributed throughout a cluster of machines is key to a well functioning system.
The most naive approach to a one-hop DHT, uses a single unique token from within the key name-space assigned to each node. The range beginning with the next lowest sorting token and ending in the nodes token, makes up that nodes partition. This single partition per node distribution works, but suffers from a number of problems, not least of which is the small number of localized nodes involved in fail-over and repair, and the operational complexity of expanding the cluster without creating imbalance.
In the white-paper entitled Dynamo: Amazon's Highly Available Key-Value Store, Amazon's solution to partition distribution was to randomly assign many randomly allocated tokens to each machine in the cluster. This approach, referred to as "virtual nodes", permits cluster-wide parallelism during during fail-over and repair, and requires no manual intervention in maintaining a high degree of balance.
Despite the inspiration that Dynamo provided for Cassandra, an early design decision eschewed virtual nodes in favor of a simpler implementation. This talk will compare and contrast the differing approaches to distribution, the properties they provide, and will discuss the transition of Cassandra from 1-token-per-node, to a complete virtual nodes implementation.