Friday, May 18, 2012

Why HashSet / HashMap in java does not maintain the insertion order?


Why HashSet / HashMap in java does not maintain the insertion order? example code

Tough interview questions like "Why HashSet / HashMap in java does not maintain the insertion order? OR Why HashSet / HashMap does not make guarantee that the iteration order will remain constant over time?" may be asked .When you run the following code with default hash map capacity (16) , you may get the output as gvien below
    
import java.util.*;
import java.util.*;
class HashMap2
{
public static void main(String args[]) {
HashMap hm=new HashMap();
hm.put("545", "Kumar");
hm.put("5143", "Jegan");
hm.put("767", "Akshu");
hm.put("1766", "Lakshan");
    Iterator it = hm.entrySet().iterator();
System.out.println("\nIteration Order when using HashMap");
    while (it.hasNext()) {
        Map.Entry keyValue= (Map.Entry)it.next();
        System.out.println(keyValue.getKey() + " -> " + keyValue.getValue());
    }
HashSet<String> set = new HashSet<String>(16);
set.add("Apple");
set.add("Banana");
set.add("Orange");
set.add("Pine Apple");
System.out.println("\nIteration Order when using HashSet");
  for (String fruit : set){
         System.out.println(fruit); }
}
}


Insertion Order in HashMap : 545 , 5143, 767, 1766 . In HashSet : Apple, Banana, Orange, Pine Apple. The Iteration order is given below.
Output:

In the above output , the display order (iteration order) is different from the order in which the elements are inserted. Let us see how does this happen ?
HashSet stores elements in a hash table (actually through a HashMap instance , where element in the HashSet is the key to HashMap and dummy object is passed as a value). So the below answer is applicable to both HashMap and HashSet . Hashing determines where the object has to be placed in a HashMap. In Detail : HashMap internally uses an array to store key and value pairs (i.e the reference to HashMap.Entry objects where HashMap.Entry objects have the fields to store hash, key ,value , link to next HashMap.Entry object - a linked list implementation) . The index to above array is calculated from the key using hash function. The hash function accepts the hashcode of the key (key.hashCode()) and calculates the hash using formula given below and returns hash. The index is calculated based on the hash value and the length of array (Capacity) with the formula ( h & (length-1) , h -> hash value, length -> array capacity) so that the elements (key-value pairs) can be distributed in the array within the given range (capacity) . Array can be resized if required but the length should be a power of two. So whenever the length (capacity) of the array is resized to 2^n , the index of the element is also changed .Thus the iteration order is changed.
Function to calculate hash using the hashcode of key.
    
//fromJava Source code . This function ensures that hashCodes that differ only by constant multiples at each bit position have a bounded  number of collisions (approximately 8 at default load factor).
  static int hash(int h) {
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

The below code prints the index which is calculated using the hashcode and capacity using hash function .
    
import java.util.*;
class HashMap3
{
public static void main(String args[]) {
int length=128; // capacity
HashMap hm=new HashMap(length);
hm.put("545", "Kumar");
hm.put("5143", "Jegan");
hm.put("767", "Akshu");
hm.put("1766", "Lakshan");
    Iterator it = hm.entrySet().iterator();
System.out.println("\nIteration Order when using HashMap with capacity : "+ length);
    while (it.hasNext()) {
        Map.Entry keyValue= (Map.Entry)it.next();
        System.out.println(  "Index : " + (hash(keyValue.getKey().hashCode()) & (length -1) )+ " " +  keyValue.getKey() + " -> " + keyValue.getValue());
    }

HashSet<String> set = new HashSet<String>(length);
set.add("Apple");
set.add("Banana");
set.add("Orange");
set.add("Pine Apple");

System.out.println("\nIteration Order when using HashSet with capacity : "+ length);
    for (String fruit : set){
System.out.println("Index : " + (hash(fruit.hashCode()) & (length -1) )+ " Element : " + fruit);
}
}

static int hash(int h) {
         h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
}

Output :

Two keys with different hashCode can also generate the same index (collisions) . In the above output, Orange and Banana has same index 12. The following two lines of code adds new elements and also solves collision.
Entry e = table[index];
table[index] = new Entry(hash, key, value, e);

if e==null means , there is no collision. Collision can be resolved by many techniques like Chaining

Picture : HashMap with Collision

Change the length to 128 and run the above program . The index / iteration order is changed.

Final conclusion : The location of the element in the HashMap is determined by the Hashcode of key and capacity of the HashMap.
Note: To maintain insertion order , you can use LinkedHashSet / LinkedHashMap

1 comment:

  1. Good Post
    But first example which you explained insertion of order of fruits with same hash code or indexfor orange & banana is 12, so first we inserted Banana after that orange right, while iterating that time first will print Banaa after orange am i right why its in reverse?

    My assumption same hash code for two keys will store in a same bucket with a LinkedList first node is banana linked to orange right?

    ReplyDelete