🔗 Share this topic View All Subjects

How HashSet Works: HashMap Internals, Load Factor, and Rehashing

अगर आप Java में programming कर रहे हैं, तो आपने HashSet का नाम ज़रूर सुना होगा। लेकिन क्या आपने कभी सोचा है कि HashSet internally काम कैसे करता है? असल में HashSet के पीछे जो main powerhouse है, वो है HashMap। आज हम step-by-step समझेंगे कि HashSet data को store कैसे करता है, HashMap की क्या भूमिका होती है, Load Factor और Rehashing क्या होते हैं, और इनका performance पर क्या असर पड़ता है।

HashSet Overview

HashSet Java Collection Framework का हिस्सा है जो unique elements को store करता है। इसका मतलब है कि कोई भी duplicate element HashSet में नहीं रखा जा सकता। यह class internally HashMap का use करती है elements को store करने के लिए।

Key Points about HashSet

Duplicate values को allow नहीं करता।
Elements unordered होते हैं (insertion order maintain नहीं होती)।
Internally HashMap का use करता है।
Null value को accept करता है (लेकिन सिर्फ एक बार)।

Internal Working of HashSet

जब आप HashSet में कोई element add करते हैं, तो internally वो element एक HashMap की key के रूप में store होता है। HashSet के अंदर HashMap create होता है, और जो भी element आप add करते हैं, वो HashMap की key बनता है, जबकि value के लिए एक constant dummy object (PRESENT) store किया जाता है।

Example of Internal Storage


HashSet<String> set = new HashSet<>();

set.add("Java");

set.add("Python");

set.add("Java"); // Duplicate element

ऊपर के example में “Java” दो बार add किया गया है, लेकिन HashSet duplicate को allow नहीं करता। क्योंकि internally जब “Java” दूसरी बार add होता है, तो HashMap check करता है कि क्या ये key पहले से मौजूद है या नहीं। अगर है, तो वो simply ignore कर देता है।

Role of HashMap in HashSet

HashSet internally एक private HashMap instance रखता है। हर बार जब हम HashSet में कोई element add करते हैं, HashSet उस element को HashMap की key के रूप में डाल देता है और एक constant dummy value रख देता है।

Internal Structure Example


private static final Object PRESENT = new Object();

private transient HashMap<E,Object> map;



public boolean add(E e) {

    return map.put(e, PRESENT) == null;

}

इसका मतलब है कि HashSet में जो भी value आप डालते हैं, वो HashMap की key बन जाती है, और value के रूप में “PRESENT” object assign हो जाता है। इस तरीके से duplicate key (यानी duplicate element) HashSet में store नहीं हो पाता।

How Hash Function Works

आपका अगला टॉपिक पढ़े Adding, Removing, and Testing Membership – O(1) Average Time Complexity

जब आप किसी element को HashSet में डालते हैं, HashMap पहले उस element का hashCode() calculate करता है। फिर hash value के आधार पर उसे एक specific bucket में place करता है। अगर दो keys का hashCode same है, तो collision होता है।

Collision Handling in HashMap

HashMap पहले singly linked list का use करता था (Java 7 तक)।
Java 8 से, अगर किसी bucket में 8 से ज्यादा elements हो जाएँ, तो HashMap उस list को balanced tree में बदल देता है।
इससे performance O(n) से घटकर O(log n) हो जाती है।

इसलिए, HashSet की speed भी HashMap की efficiency पर depend करती है, खासकर hashing और collision handling के तरीके पर।

What is Load Factor in HashSet

आपका अगला टॉपिक पढ़े Iteration Order, Null Elements, and Thread Safety Considerations

Load Factor वो ratio होता है जो बताता है कि HashMap (और indirectly HashSet) कितनी capacity तक भर सकता है, उसके बाद resize किया जाएगा। Default Load Factor Java में 0.75 होता है।

इसका मतलब — अगर HashMap की capacity 16 है, तो जब वो 75% (यानि 12 entries) तक भर जाएगा, तब rehashing (resize) process शुरू हो जाएगी।

Load Factor Formula

Term	Meaning
Capacity	HashMap में total available buckets
Load Factor	Resize होने की threshold limit
Threshold	capacity × load factor

Example:


Capacity = 16

Load Factor = 0.75

Threshold = 16 × 0.75 = 12

इसका मतलब जब 12 elements add हो जाएंगे, HashSet (via HashMap) automatically size बढ़ा देगा।

Rehashing Process Explained

Rehashing वो process है जिसमें HashMap अपनी capacity बढ़ाता है और सभी existing elements को नए buckets में reassign करता है। ये तब होता है जब threshold value cross हो जाती है।

Steps in Rehashing

HashMap अपनी current capacity को double कर देता है।
हर key का hashCode फिर से calculate किया जाता है।
सभी elements को नए buckets में redistribute किया जाता है।

Rehashing performance improve करती है, लेकिन memory usage बढ़ा देती है। इसलिए initial capacity और load factor को सही choose करना ज़रूरी है।

Rehashing Example


HashSet<String> set = new HashSet<>(16, 0.75f);

for(int i = 1; i <= 13; i++) {

  set.add("Data" + i);

}

// जब 13वां element add होगा, rehashing trigger होगी

Performance Analysis of HashSet

HashSet की performance average case में बहुत अच्छी होती है क्योंकि ये constant time complexity provide करता है — यानी O(1) average complexity add, remove और contains operations के लिए। लेकिन worst case में ये O(n) भी हो सकती है अगर बहुत सारे collisions हों।

Operation	Average Time	Worst Case
add()	O(1)	O(n)
remove()	O(1)	O(n)
contains()	O(1)	O(n)

Factors Affecting Performance

Load Factor और initial capacity का सही चयन।
Efficient hashCode() और equals() implementation।
Low collision rate।

Importance of equals() and hashCode()

HashSet में duplicate check करने के लिए Java दो methods use करता है — hashCode() और equals()। अगर दो objects का hashCode same है, तो equals() check करता है कि क्या दोनों equal हैं या नहीं।


class Student {

  int id;

  String name;


  public int hashCode() {

    return id;

  }


  public boolean equals(Object o) {

    if (this == o) return true;

    if (!(o instanceof Student)) return false;

    Student s = (Student) o;

    return id == s.id;

  }

}

अगर आप ये methods override नहीं करते, तो HashSet सही तरीके से duplicates को पहचान नहीं पाएगा।

Advantages and Disadvantages

Advantages

Unique data store करने के लिए perfect।
Fast searching, insertion, deletion।
Automatic resizing via rehashing।

Disadvantages

Order maintain नहीं होता।
Rehashing में performance temporarily slow हो सकती है।
Memory consumption थोड़ी ज़्यादा होती है।

Real-Life Usage of HashSet

HashSet का use बहुत सारे real-world applications में किया जाता है जहाँ duplicate data को avoid करना होता है।

Student IDs या Employee IDs store करने में।
Unique email list बनाते समय।
Visited URLs या cached items track करने में।
Duplicate entries remove करने में।

Best Practices for Using HashSet

जब data बड़ा हो, तो initial capacity manually define करें।
hashCode() और equals() methods हमेशा override करें।
अगर order ज़रूरी है, तो LinkedHashSet use करें।
अगर thread safety चाहिए, तो Collections.synchronizedSet() या ConcurrentHashMap use करें।

Important Notes for Exam

HashSet internally HashMap का use करता है।
Load Factor default = 0.75 होता है।
Rehashing तब होती है जब capacity × load factor cross हो जाए।
Average time complexity O(1) होती है।
Duplicates allow नहीं होते, order maintain नहीं होता।
Efficient performance के लिए proper hashCode() और equals() जरूरी हैं।
HashSet memory-efficient और fast collection है।

How HashSet Works: HashMap Internals, Load Factor, and Rehashing

How HashSet Works: HashMap Internals, Load Factor, and Rehashing

HashSet Overview

Key Points about HashSet

Internal Working of HashSet

Example of Internal Storage

Role of HashMap in HashSet

Internal Structure Example

How Hash Function Works

Collision Handling in HashMap

What is Load Factor in HashSet

Load Factor Formula

Example:

Rehashing Process Explained

Steps in Rehashing

Rehashing Example

Performance Analysis of HashSet

Factors Affecting Performance

Importance of equals() and hashCode()

Advantages and Disadvantages

Advantages

Disadvantages

Real-Life Usage of HashSet

Best Practices for Using HashSet

Important Notes for Exam

Author Name : Arpit Nageshwar

Advance+Java notes in hindi

बताएं हम और बेहतर क्या कर सकते हैं

How HashSet Works: HashMap Internals, Load Factor, and Rehashing

How HashSet Works: HashMap Internals, Load Factor, and Rehashing

HashSet Overview

Key Points about HashSet

Internal Working of HashSet

Example of Internal Storage

Role of HashMap in HashSet

Internal Structure Example

How Hash Function Works

Collision Handling in HashMap

What is Load Factor in HashSet

Load Factor Formula

Example:

Rehashing Process Explained

Steps in Rehashing

Rehashing Example

Performance Analysis of HashSet

Factors Affecting Performance

Importance of equals() and hashCode()

Advantages and Disadvantages

Advantages

Disadvantages

Real-Life Usage of HashSet

Best Practices for Using HashSet

Important Notes for Exam

Author Name : Arpit Nageshwar

Advance+Java notes in hindi