Why is consistent hash effective for distributed systems?

Tomoharu Tsutsumi
5 min readDec 21, 2024

Understanding Consistent Hashing: A Comparison with Modular Hashing

In distributed systems, efficient distribution of data and tasks across multiple nodes is critical for performance and scalability. Two common strategies for achieving this are modular hashing and consistent hashing. While modular hashing is straightforward and intuitive, consistent hashing offers greater flexibility and resilience in dynamic environments. This article explores the key differences between the two approaches, highlighting the advantages of consistent hashing from both a theoretical and practical perspective.

What is Modular Hashing?

Modular hashing is a simple technique where a key is assigned to a node using the modulus operation. Given N nodes and a key K, the node responsible for the key is determined as:

Node = hash(K) % N

Advantages of Modular Hashing:

Simplicity:

  • Easy to implement and computationally inexpensive.

Uniform Distribution:

  • Keys are evenly distributed across nodes if the hash function is well-designed.

Challenges of Modular Hashing:

Poor Scalability:

  • Adding or removing a node changes N, causing most keys to remap to different nodes. This requires extensive data migration and disrupts system operations.

High Reassignment Cost:

  • When nodes join or leave, nearly all keys need to be reassigned, making modular hashing unsuitable for dynamic environments.

What is Consistent Hashing?

Consistent hashing addresses the limitations of modular hashing by reducing the amount of remapping required when nodes are added or removed. In consistent hashing, both nodes and keys are placed on a logical hash ring, and a key is assigned to the first node encountered while traversing the ring clockwise.

Key Characteristics of Consistent Hashing:

Minimal Key Movement:

  • When a node is added or removed, only the keys in the adjacent range need to be reassigned.

Virtual Nodes:

・To ensure even distribution, each physical node is represented by multiple virtual nodes, which are spread across the ring.

Dynamic Scaling:

  • New nodes can be seamlessly integrated without significant disruption.

Advantages of Consistent Hashing:

Scalability:

  • Ideal for systems where nodes are frequently added or removed, such as distributed caching systems or cloud environments.

Load Balancing:

  • Virtual nodes distribute the load more evenly, preventing hotspots.

Fault Tolerance:

  • If a node fails, its keys are automatically redistributed to neighboring nodes.

Challenges of Consistent Hashing:

Complexity:

  • Implementation is more intricate compared to modular hashing.

Performance Overhead:

  • Managing virtual nodes and maintaining a sorted hash ring require additional computational resources.

Key Differences Between Modular and Consistent Hashing

Feature Modular Hashing Consistent Hashing Key Distribution Depends on the modulus; sensitive to changes in N. Evenly distributed with minimal remapping. Scalability Poor; adding/removing nodes disrupts all keys. Excellent; only affects neighboring keys. Load Balancing Uniform but sensitive to N. Improved with virtual nodes. Fault Tolerance Limited; node failure disrupts many keys. High; automatic redistribution. Implementation Simple and efficient. More complex and resource-intensive.

Real-World Use Cases

Modular Hashing:

Static Environments:

  • Systems with a fixed number of nodes, such as small-scale distributed storage.

Simple Applications:

  • Lightweight use cases where node additions or removals are rare.

Consistent Hashing:

Distributed Caching:

  • Used by systems like Memcached and Redis to distribute cached data across servers.

Content Delivery Networks (CDNs):

  • Ensures smooth distribution of requests to edge servers.

Cloud and Microservices:

  • Handles dynamic scaling of nodes with minimal disruption.

Why Consistent Hashing is Superior for Dynamic Systems

Adaptability:

  • In modern cloud-based architectures, nodes frequently scale up or down to handle varying workloads. Consistent hashing accommodates this dynamism efficiently.

Reduced Operational Costs:

  • By minimizing data migration, consistent hashing reduces the overhead associated with rebalancing keys across nodes.

Improved User Experience:

  • In systems like distributed databases or caches, consistent hashing ensures uninterrupted service during node changes, providing a better end-user experience.

Detailed Code in Go

package main

import (
"crypto/sha256"
"encoding/binary"
"fmt"
"sort"
"strconv"
)

type VirtualNode struct {
Hash uint32
Node string
}

// HashRing represents a consistent hash ring with virtual nodes
type HashRing struct {
nodes []VirtualNode
replicas int
}

// NewHashRing creates a new hash ring
func NewHashRing(replicas int) *HashRing {
return &HashRing{
nodes: []VirtualNode{},
replicas: replicas,
}
}

// hashFunc computes a uint32 hash from a string
func hashFunc(data string) uint32 {
hash := sha256.Sum256([]byte(data))
return binary.BigEndian.Uint32(hash[:4]) // Use the first 4 bytes
}

// AddNode adds a node with virtual nodes
func (hr *HashRing) AddNode(node string) {
for i := 0; i < hr.replicas; i++ {
virtualNodeKey := node + "#" + strconv.Itoa(i)
hash := hashFunc(virtualNodeKey)
hr.nodes = append(hr.nodes, VirtualNode{Hash: hash, Node: node})
}

// Sort the virtual nodes by their hash
sort.Slice(hr.nodes, func(i, j int) bool {
return hr.nodes[i].Hash < hr.nodes[j].Hash
})
}

// RemoveNode removes a node and its virtual nodes
func (hr *HashRing) RemoveNode(node string) {
filtered := []VirtualNode{}
for _, vn := range hr.nodes {
if vn.Node != node {
filtered = append(filtered, vn)
}
}
hr.nodes = filtered
}

// GetNode finds the appropriate node for a given key
func (hr *HashRing) GetNode(key string) string {
if len(hr.nodes) == 0 {
return ""
}

keyHash := hashFunc(key)
idx := sort.Search(len(hr.nodes), func(i int) bool {
return hr.nodes[i].Hash >= keyHash
})

// Wrap around to the first node if necessary
if idx == len(hr.nodes) {
idx = 0
}
return hr.nodes[idx].Node
}

func main() {
// Create a hash ring with 3 virtual nodes per physical node
ring := NewHashRing(3)

// Add nodes
ring.AddNode("NodeA")
ring.AddNode("NodeB")
ring.AddNode("NodeC")

// Get nodes for keys
keys := []string{"key1", "key2", "key3", "key4", "key5"}
for _, key := range keys {
node := ring.GetNode(key)
fmt.Printf("Key: %s -> Node: %s\n", key, node)
}

// Remove a node and recheck
fmt.Println("\nRemoving NodeB...")
ring.RemoveNode("NodeB")
for _, key := range keys {
node := ring.GetNode(key)
fmt.Printf("Key: %s -> Node: %s\n", key, node)
}
}

Conclusion

While modular hashing is a good choice for static and simple systems, its limitations make it unsuitable for dynamic and large-scale distributed environments. Consistent hashing, with its ability to minimize disruption and balance load effectively, has become the go-to solution for modern distributed systems.

By incorporating techniques like virtual nodes and efficient hashing algorithms, consistent hashing ensures scalability, reliability, and fault tolerance, making it indispensable for cloud computing, distributed databases, and other cutting-edge applications.

Feel free to reach out to me on LinkedIn, which you can find below. Looking forward to connecting!

https://www.linkedin.com/in/tomoharu-tsutsumi-56051a126/

--

--

Tomoharu Tsutsumi
Tomoharu Tsutsumi

Written by Tomoharu Tsutsumi

5+ years Full Stack SWE (Ruby, Go, TypeScript, JavaScript) | Former Founding Engineer of AI Startup in Canada

No responses yet