Tuesday, 12 January 2016

Map & Set Tracking

1) How HashSet internally work ??

HashSet uses HashMap internally to store it’s objects. 
HashSet is developed on the top of HashMap.

> Whenever you create a HashSet object, one HashMap object associated with it is also created. 
> This HashMap object is used to store the elements you enter in the HashSet. 
> The elements you add into HashSet are stored as keys of this HashMap object. 
> The value associated with those keys will be a constant named as "PRESENT". 
> Every constructor of HashSet class internally creates one HashMap object.

2) How HashMap internally work ??

static class Entry<K,V> implements Map.Entry<K,V> 
{
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
        //Some methods are defined here
}

key : It stores the key of an element and its final.

value : It holds the value of an element.

next : It holds the pointer to next key-value pair. This attribute makes the key-value pairs stored as a linked list.

hash : It holds the hashcode of the key.

These Entry objects are stored in an array called table[]. This array is initially of size 16. It is defined like below.

     /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */

    transient Entry<K,V>[] table;

To summarize the whole HashMap structure, each key-value pair is stored in an object of Entry<K, V> class. This class has an attribute called next which holds the pointer to next key-value pair. This makes the key-value pairs stored as a linked list. All these Entry<K, V> objects are stored in an array called table[]. 

3) What Is Hashing?

The whole HashMap data structure is based on the principle of Hashing. Hashing is nothing but the function or algorithm or method which when applied on any object/variable returns an unique integer value representing that object/variable. This unique integer value is called hash code. Hash function or simply hash said to be the best if it returns the same hash code each time it is called on the same object. Two objects can have same hash code.

Whenever you insert new key-value pair using put() method, HashMap blindly doesn’t allocate slot in the table[] array. Instead it calls hash function on the key. HashMap has its own hash function to calculate the hash code of the key. 

4) How race condition works in Java Hashmap?

Hashmap could run into race condition if it would be modified by two thread simultaneous and one thread tries to resize or rehash the map because of capacity crossing threshold value. since hashmap maintains a linked list of element in bucket and while copying from one hashmap to other or old to new order of linked list got reversed, which could result in infinite loop if two threads are doing resizing at same time.

 there are potential race conditions:
  • when resizing an HashMap by two threads at the same time
  • when collisions happens. Collision can happen when two elements map to the same cell even if they have a different hashcode. During the conflict resolution, there can be a race condition and one added key/value pair could be overwritten by another pair inserted by another thread.
To explain better what I mean on the second point, I was looking at the source code of HashMap in OpenJdk 7
389        int hash = hash(key.hashCode());
390        int i = indexFor(hash, table.length);
First it calculates an Hash of your key (combining two hash functions), then it maps to a cell with indexFor, then it checks if that cell contains the same key or is already occupied by another one. If it's the same key, it just overwrite the value and there is no problem here.
If it's occupied it looks at the next cell and then the next until it finds an empty position and call addEntry(), which could even decide to resize the array if the array is more loaded than a certainloadFactor.
Our table containing the entries is just a vector of Entry which holds key and value.
146    /**
147     * The table, resized as necessary. Length MUST Always be a power of two.
148     */
149    transient Entry[] table;
In a concurrent environment, all sort of evil things can happen, for instance one thread gets a collision for cell number 5 and looks for the next cell (6) and finds it empty.
Meanwhile another thread gets an index of 6 as a result of indexFor and both decide to use that cell at the same time, one of the two overwriting the other.

5) How does get() method of HashMap works, if two keys has same hashCode?

Hashmap works on the principle of hashing, we have put(key, value) and get(key) method for storing and retrieving Objects from HashMap. When we pass Key and Value object to put() method on Java HashMap, HashMap implementation calls hashCode method on Key object and applies returned hashcode into its own hashing function to find a bucket location for storing Entry object, important point to mention is that HashMap in Java stores both key and value object as Map.Entry in a bucket which is essential to understand the retrieving logic. 


Important Discussions on HashMap : Please go through it.

How Hashmap works in Java

HashMap works on the principle of Hashing .  
To understand Hashing, we should understand the three terms first 1) Hash Function 2) Hash Value 3) Bucket.

What is Hash Function, Hash Value and Bucket ?

hashCode() function  which returns an integer value is the Hash function. The important point to note that,  this method is present in Object class (Mother of all class).

This is the code for the hash function(also known as hashCode method) in Object Class :
    public native int hashCode();

The most important point to note from the above line :  hashCode method return  int value.

So the Hash value is the int value returned by the hash function .

A bucket is used to store key value pairs . 
A bucket can have multiple key-value pairs . In hash map, bucket used simple linked list to store objects .

After understanding the terms we are ready to move next step , How hash map works in java or How get() works internally in java .

Code inside Java Api (HashMap class internal implementation) for HashMap get(Obejct key) method 

1.  Public  V get(Object key)
   {
2.     if (key ==null)
3.     //Some code
    
4.     int hash = hash(key.hashCode());
    
5.     // if key found in hash table then  return value
6.     //    else return null
   }

Hash map works on the principle of hashing 

HashMap get(Key k) method calls hashCode method on the key object and applies returned hashValue to its own static hash function to find a bucket location(backing array) where keys and values are stored in form of a nested class called Entry (Map.Entry) . So you have concluded that from the previous line that Both key and value is stored in the bucket as a form of  Entry object . So thinking that Only value is stored  in the bucket is not correct and will not give a good impression on the interviewer .

* Whenever we call get( Key k )  method on the HashMap object . First it checks that whether key is null or not .  Note that there can only be one null key in HashMap .  

If key is null , then Null keys always map to hash 0, thus index 0.

If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode()  ,so after key.hashCode() returns hashValue , line 4 looks like

4) int hash = hash(hashValue) and now it applies returned hashValue into its own hashing function .

We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is, It defends against poor quality hash functions.

Now step 4 final  hashvalue is used to find the bucket location at which the Entry object is stored . Entry object stores in the bucket like this (hash,key,value,bucketindex) .  


Interviewer: What if when two different keys have the same hashcode ?

Solution, equals() method comes to rescue. Here candidate gets puzzled. Since bucket is one and we have two objects with the same hashcode .Candidate usually forgets that bucket is a simple linked list.

The bucket is the linked list effectively. Its not a LinkedList as in a java.util.LinkedList - It's a separate (simpler) implementation just for the map .

So we traverse through linked list , comparing keys in each entries using keys.equals() until it return true.  Then the corresponding entry object Value is returned .


How hashmap works internally in java

One of  our readers Jammy  asked a very good  question 

When the functions 'equals' traverses through the linked list does it traverses from start to end one by one...in other words brute method. Or the linked list is sorted based on key and then it traverses? 

Answer is when an element is added/retrieved, same procedure follows:

 
a. Using key.hashCode() [ see above step 4],determine initial hashvalue for the key

b. Pass intial hashvalue as hashValue  in    hash(hashValue) function, to calculate the final hashvalue.

c. Final hash value is then passed as a first parameter in the indexFor(int ,int )method .
    The second parameter is length which is a constant in HashMap Java Api , represented by                             DEFAULT_INITIAL_CAPACITY

    The default  value of DEFAULT_INITIAL_CAPACITY is 16 in HashMap Java Api .

 indexFor(int,int) method  returns the first entry in the appropriate bucket. The linked list in the bucket is then iterated over - (the end is found and the element is added or the key is matched and the value is returned )


Explanation about indexFor(int,int) is below :

/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
    return h & (length-1);
}


The above function indexFor() works because Java HashMaps always have a capacity, i.e. number of buckets, as a power of 2.
 Let's work with a capacity of 256,which is 0x100, but it could work with any power of 2. Subtracting 1
from a power of 2 yields the exact bit mask needed to bitwise-and with the hash to get the proper bucket index, of range 0 to length - 1.
256 - 1 = 255
0x100 - 0x1 = 0xFF
E.g. a hash of 257 (0x101) gets bitwise-anded with 0xFF to yield a bucket number of 1.


Interviewer:    What if  when two  keys are same and have the same hashcode ?

If key needs to be inserted and already inserted hashkey's hashcodes are same, and keys are also same(via reference or using equals() method)  then override the previous key value pair with the current key value pair.

The other important point to note is that in Map ,Any class(String etc.) can serve as a key if and only if it overrides the equals() and hashCode() method .


Interviewer:  How will you measure the performance of HashMap?

According to Oracle Java docs,  

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor.  The capacity is the number of buckets in the hash table( HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.), and the initial capacity is simply the capacity at the time the hash table is created. 


The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

In HashMap class, the default value of load factor is (.75) .

Interviewer : What is the time complexity of Hashmap get() and put() method ?

The hashmap implementation provides constant time performance for (get and put) basic operations
i.e the complexity of get() and put() is O(1) , assuming the hash function disperses the elements properly among the buckets. 


How to implement duplicate key for a HashMap?

Example :

Map data = new HashMap(); 
data.put("a","America"); 
data.put("a","Africa"); 
data.put("b","Bangladesh");
Need to have both "America" and "Africa" as value against key "a". Is there any mechanism to achieve this scenario.

By using :

Map<String,List> data = new HashMap();
{a:[America,Africa] ,b:[Bangaldesh]}


No comments:

Post a Comment

JSP interview questions and answers

Q1. What is JSP and why do we need it? JSP stands for JavaServer Pages. JSP is java server side technology to create dynamic web pages. J...