Collision Resolution in Hashing
๐ What is a Collision in Hashing?
A collision occurs when two different keys produce the same hash value (i.e., they map to the same index in a hash table).
๐งช Example:
If h(25) = 5
and h(35) = 5
, then both values are trying to go to index 5
.
๐ง Collision Resolution Techniques
There are two primary ways to resolve collisions:
-
Open Addressing (Closed Hashing)
-
Chaining (Open Hashing)
1️⃣ Open Addressing
In open addressing, when a collision occurs, the algorithm searches for the next available slot in the hash table until an empty slot is found.
The entire table is treated as a single array. Only one key is stored at each index.So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing.
๐ General Formula:
Where:
-
h(k)
is the base hash function -
probe(i)
determines how we look for the next slot -
i
is the probe number (attempt number) -
m
is the table size
๐ Types of Open Addressing
a) Linear Probing
Check the next slot (sequentially) until an empty spot is found.
๐งช Example:
If h(25) = 5
, and index 5
is full, check 6
, then 7
, etc.
๐ Advantages
- Simple and Easy to implement
❌Drawback: Primary Clustering
- Prone to clustering, where consecutive slots get filled, potentially leading to further collisions.
b) Quadratic Probing
Search for empty slots using a quadratic function.
๐งช Example:
Try h(k)
, h(k)+1²
, h(k)+2²
, etc.
✅ Advantage:
-
Reduces clustering
❌ Drawback:
-
May not examine all slots (table size should be prime).May still result in clustering, and the quadratic function needs to be chosen carefully to avoid infinite loops.
c) Double Hashing
Uses a second hash function to compute the probe step size.
Where:
-
h1(k)
is the primary hash -
h2(k)
is the secondary hash, not equal to 0
✅ Advantage:
-
Better distribution, minimal clustering
❌Disadvantage:
- Disadvantages: Requires a good choice of the second hash function, and if not carefully designed, it may still lead to clustering.
1️⃣ Insertion in Open Addressing
Algorithm:- Compute hash index: index = h(k)
- If the slot is empty, insert it there.
- If occupied, probe the next slot using linear probing: (index + 1) % m, and so on.
- Stop when an empty slot is found or the table is full.
2️⃣ Search in Open Addressing
Algorithm:- Compute hash index: index = h(k)
- Check if the key is at that index.
- If not, probe the next slot: (index + 1) % m until:
- You find the key → Success
- You find an empty slot → Key not found
- You loop back to the starting index → Key not found
3️⃣ Deletion in Open Addressing
Problem:
You can’t just make the slot empty (None
or _
) because it breaks the probing chain for future searches.
✅ Solution:
Use a special marker like "DELETED"
to indicate that the slot is available for insertion but not empty for search purposes.
Algorithm:
-
Search for the key using probing.
-
If found, replace it with
"DELETED"
. -
For insertion, you may reuse
"DELETED"
slots.
⚠️ Notes on Open Addressing:
Deletion is tricky: Requires special markers (e.g., “DELETED” flags)
-
Table must not be full for successful insertion
-
Load factor (number of elements / table size) must be kept low (typically < 0.7)
2️⃣ Chaining (Open Hashing)
In chaining, each slot in the hash table points to a linked list (or chain) of entries that hash to the same index.
๐งช Example:
If h(25) = 5
and h(35) = 5
, both are stored in a list at index 5
.
✅ Advantages of Chaining:
-
No limit on the number of keys stored (as long as memory allows)
-
Deletions are simple
-
Works well even if the load factor > 1
❌ Disadvantages of Chaining:
-
Extra memory needed for pointers
-
May result in longer search time if chains are long
1️⃣ Insertion in Chaining
๐ง Idea:
-
Compute the hash index
-
Insert the key at the beginning (or end) of the linked list at that index
๐ Algorithm:
Note: Inserting at the beginning is faster in linked lists (O(1))
2️⃣ Search in Chaining
๐ง Idea:
-
Compute the hash index
-
Traverse the list at that index to find the key
๐ Algorithm:
3️⃣ Deletion in Chaining
๐ง Idea:
-
Compute hash index
-
Traverse the list at that index
-
Remove the node if found
๐ Algorithm:
This is standard linked list deletion (requires pointer manipulation)
๐ Summary Table
Operation | Time Complexity (Average) | Time Complexity (Worst) |
---|---|---|
Insertion | O(1) | O(n) (if poorly distributed) |
Search | O(1) | O(n) |
Deletion | O(1) | O(n) |
๐ Summary Table
Technique | Storage Structure | Handles Full Table? | Deletion Complexity | Clustering |
---|---|---|---|---|
Open Addressing | Same array | No | Difficult | Can be an issue |
Chaining | Array + Linked Lists | Yes | Easy | Less likely |
๐ก When to Use What?
Use Case | Best Method |
---|---|
Low memory system | Open Addressing |
Unpredictable number of keys | Chaining |
Simpler deletion needed | Chaining |
Faster search when load is low | Open Addressing |
๐ Hashing Techniques: Chaining vs Open Addressing
Feature | Chaining (Separate Chaining) | Open Addressing |
---|---|---|
Storage Structure | Uses linked lists at each index | All elements are stored within the table itself |
Collision Handling | New elements are added to the list at the index | Uses probing (linear/quadratic/double hashing) to find next slot |
Table Size | Can handle more elements than table size | Limited to table size (n ≤ m ) |
Memory Usage | Extra memory for pointers in linked lists | No extra memory (in-place storage) |
Insertion Time (avg) | O(1) | O(1) if load factor is low |
Search Time (avg) | O(1) | O(1) if load factor is low |
Worst-case Time | O(n) – if all keys hash to same index | O(n) – if table is full or many collisions occur |
Deletion Complexity | Easier – just remove from the linked list | Harder – needs special handling (e.g., tombstones) |
Load Factor (ฮฑ) | Can go beyond 1 (ฮฑ = n/m can be > 1) | Must be ≤ 1 (since all data stored in array) |
Cache Performance | Poor (due to linked list dereferencing) | Better (elements are in contiguous memory) |
Implementation Complexity | Requires linked list or dynamic memory management | Simple array-based implementation |
Ideal For | Unpredictable or large data sets | Small or memory-constrained environments |
✅ Summary:
-
Chaining is more flexible and simpler for deletion.
-
Open Addressing is space-efficient but needs careful probing and deletion logic.
Example Problems
Let the size of a hash table is 10. The index of the hash table varies from 0 to 9. Assume the keys 73, 54, 15, 48, 89, 66, 37, 18, 41, 22, 62 are mapped using modulo operator.
Show how the keys are distributed using chaining method.
Apply the hash function h(x) = x mod 7 for linear probing on the data 2341, 4234,2839, 430, 22, 397, 3920 and show the resulting hash table
Comments
Post a Comment