Collision Resolution in Hashing

 

๐Ÿ” What is a Collision in Hashing?

A collision occurs when two different keys produce the same hash value (i.e., they map to the same index in a hash table).

๐Ÿงช Example:

If h(25) = 5 and h(35) = 5, then both values are trying to go to index 5.


๐Ÿšง Collision Resolution Techniques

There are two primary ways to resolve collisions:

  1. Open Addressing (Closed Hashing)

  2. Chaining (Open Hashing)


1️⃣ Open Addressing

In open addressing, when a collision occurs, the algorithm searches for the next available slot in the hash table until an empty slot is found.

The entire table is treated as a single array. Only one key is stored at each index.So at any point, the size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed). This approach is also known as closed hashing. 



๐Ÿ“Œ General Formula:


h(k, i) = (h(k) + probe(i)) % m

Where:

  • h(k) is the base hash function

  • probe(i) determines how we look for the next slot

  • i is the probe number (attempt number)

  • m is the table size


๐Ÿ”„ Types of Open Addressing

a) Linear Probing

Check the next slot (sequentially) until an empty spot is found.


h(k, i) = (h(k) + i) % m

๐Ÿงช Example:

If h(25) = 5, and index 5 is full, check 6, then 7, etc.

๐Ÿ”„ Advantages

  • Simple and Easy to implement

❌Drawback: Primary Clustering

  • Prone to clustering, where consecutive slots get filled, potentially leading to further collisions.

b) Quadratic Probing

Search for empty slots using a quadratic function.


h(k, i) = (h(k) + i²) % m

๐Ÿงช Example:

Try h(k), h(k)+1², h(k)+2², etc.

✅ Advantage:

  • Reduces clustering

❌ Drawback:

  • May not examine all slots (table size should be prime).May still result in clustering, and the quadratic function needs to be chosen carefully to avoid infinite loops.


c) Double Hashing

Uses a second hash function to compute the probe step size.


h(k, i) = (h1(k) + i * h2(k)) % m

Where:

  • h1(k) is the primary hash

  • h2(k) is the secondary hash, not equal to 0

✅ Advantage:

  • Better distribution, minimal clustering

❌Disadvantage:

  • Disadvantages: Requires a good choice of the second hash function, and if not carefully designed, it may still lead to clustering.

1️⃣ Insertion in Open Addressing

Algorithm:
  1. Compute hash index: index = h(k)
  2. If the slot is empty, insert it there.
  3. If occupied, probe the next slot using linear probing: (index + 1) % m, and so on.
  4. Stop when an empty slot is found or the table is full.

2️⃣ Search in Open Addressing

Algorithm:
  1. Compute hash index: index = h(k)
  2. Check if the key is at that index.
  3. If not, probe the next slot: (index + 1) % m until:
  • You find the key → Success
  • You find an empty slot → Key not found
  • You loop back to the starting index → Key not found

3️⃣ Deletion in Open Addressing

Problem:

You can’t just make the slot empty (None or _) because it breaks the probing chain for future searches.

✅ Solution:

Use a special marker like "DELETED" to indicate that the slot is available for insertion but not empty for search purposes.

Algorithm:

  1. Search for the key using probing.

  2. If found, replace it with "DELETED".

  3. For insertion, you may reuse "DELETED" slots.

⚠️ Notes on Open Addressing:

  • Deletion is tricky: Requires special markers (e.g., “DELETED” flags)

  • Table must not be full for successful insertion

  • Load factor (number of elements / table size) must be kept low (typically < 0.7)

2️⃣ Chaining (Open Hashing)

In chaining, each slot in the hash table points to a linked list (or chain) of entries that hash to the same index.



๐Ÿงช Example:

If h(25) = 5 and h(35) = 5, both are stored in a list at index 5.


Index 5 → 25 → 35 → ...

✅ Advantages of Chaining:

  • No limit on the number of keys stored (as long as memory allows)

  • Deletions are simple

  • Works well even if the load factor > 1


❌ Disadvantages of Chaining:

  • Extra memory needed for pointers

  • May result in longer search time if chains are long


1️⃣ Insertion in Chaining

๐Ÿง  Idea:

  • Compute the hash index

  • Insert the key at the beginning (or end) of the linked list at that index

๐Ÿ“Œ Algorithm:


INSERT(key): 1. index ← h(key) 2. if key is not already in chain[index]: insert key at the beginning of chain[index]
Note: Inserting at the beginning is faster in linked lists (O(1))

 

2️⃣ Search in Chaining

๐Ÿง  Idea:

  • Compute the hash index

  • Traverse the list at that index to find the key

๐Ÿ“Œ Algorithm:


SEARCH(key): 1. index ← h(key) 2. Traverse the linked list at chain[index] 3. If key is found, return True 4. Else return False



3️⃣ Deletion in Chaining

๐Ÿง  Idea:

  • Compute hash index

  • Traverse the list at that index

  • Remove the node if found

๐Ÿ“Œ Algorithm:


DELETE(key): 1. index ← h(key) 2. Traverse the linked list at chain[index] 3. If key is found, delete it from the list

This is standard linked list deletion (requires pointer manipulation)

๐Ÿ“ Summary Table

Operation    Time Complexity (Average)    Time Complexity (Worst)
Insertion                O(1)        O(n) (if poorly distributed)
Search                O(1)            O(n)
Deletion                O(1)        O(n)

๐Ÿ“Š Summary Table

TechniqueStorage StructureHandles Full Table?Deletion ComplexityClustering
Open AddressingSame arrayNoDifficultCan be an issue
ChainingArray + Linked ListsYesEasyLess likely

๐Ÿ’ก When to Use What?

Use Case        Best Method
Low memory system            Open Addressing
Unpredictable number of keys            Chaining
Simpler deletion needed                Chaining
Faster search when load is low            Open Addressing

๐Ÿ” Hashing Techniques: Chaining vs Open Addressing

FeatureChaining (Separate Chaining)Open Addressing
Storage StructureUses linked lists at each indexAll elements are stored within the table itself
Collision HandlingNew elements are added to the list at the indexUses probing (linear/quadratic/double hashing) to find next slot
Table SizeCan handle more elements than table sizeLimited to table size (n ≤ m)
Memory UsageExtra memory for pointers in linked listsNo extra memory (in-place storage)
Insertion Time (avg)O(1)O(1) if load factor is low
Search Time (avg)O(1)O(1) if load factor is low
Worst-case TimeO(n) – if all keys hash to same indexO(n) – if table is full or many collisions occur
Deletion ComplexityEasier – just remove from the linked listHarder – needs special handling (e.g., tombstones)
Load Factor (ฮฑ)Can go beyond 1 (ฮฑ = n/m can be > 1)Must be ≤ 1 (since all data stored in array)
Cache PerformancePoor (due to linked list dereferencing)Better (elements are in contiguous memory)
Implementation ComplexityRequires linked list or dynamic memory managementSimple array-based implementation
Ideal ForUnpredictable or large data setsSmall or memory-constrained environments

✅ Summary:

  • Chaining is more flexible and simpler for deletion.

  • Open Addressing is space-efficient but needs careful probing and deletion logic.


Example Problems

University Questions

Let the size of a hash table is 10. The index of the hash table varies from 0 to 9. Assume the keys 73, 54, 15, 48, 89, 66, 37, 18, 41, 22, 62 are mapped using modulo operator.

Show how the keys are distributed using chaining method.


Apply the hash function h(x) = x mod 7 for linear probing on the data 2341, 4234,2839, 430, 22, 397, 3920 and show the resulting hash table

Note:red indicates collision and the data is placed in free  position considering the linear probing technique.

Example:
Hash the following keys using open chaining method and closed linear probing method. Size of table is 7 and the Hash function H(K) = K mod 7.Keys ={16, 21, 23, 50, 19, 26}

linear probing

chaining


Example:
Hash the following keys using open chaining method and closed linear probing method. Size of table is 11 and the Hash function H(K) = K mod 11.
Keys ={17, 22, 34, 23, 19, 66}

Linear probing

Chaining


Comments

Popular posts from this blog

Data Structures and Algorithms PCCST303 KTU 2024 Scheme- Dr Binu V P

Performance Analysis - Time and Space Complexity

Data Structures and Data Abstraction