**HashMap** & **HashSet ** are not only one of the frequently used data structures, but also one of the popular interview topics.

Q1. How does a HashMap store data?

A1. As key/value pairs. You can store and retrieve values with the keys.

Q2. What is the HashMap lookup time in Big O notation?

A2. It is O(**n**) = O(k * n). On average it is O(1) if the hashCode() method spreads out the buckets as discussed below.

Q3 How does a HashMap internally store data?

A3. A backing array for the buckets and a **linked list** (until Java 8) and a “**binary tree**” (Java 8 onwards) to store the values..

Backing Array with buckets: as shown below.

**1) **When you put an object into a map with a key and a value, **hashCode()** method is implicitly invoked, and hash code value say 123 is returned. Two different keys can return the same hash value. A good hashing algorithm spreads out the numbers. In the above example, let’s assume that (“John”,01/01/1956) key and (“Peter”, 01/01/1995) key return the same hash value of **123**.

**2) **Now when a hashCode value of say 123 is returned and the initial HashMap capacity is say 10, how does it know which index of the backing array to store it in? This is where the HashMap internally invokes the **hash(int )** & **indexFor(int h, int length)** methods. This is known as the **hashing** function. This function in a very simple explanation is like

1 2 3 4 5 6 | hashCode() % capacity 123 % 10 = 3 456 % 10 = 6 |

This means the “hashcode = 123” gets stored in index 3 in the backing array. So you get numbers between **0** and **9 **for the capacity of 10. Once the HashMap reaches its 75% of the capacity indicated by the default hash factor of 0.75, the backing array’s capacity is doubled and the **rehashing** takes place to reallocate the buckets for the new capacity of 20.

1 2 3 4 5 6 | hashCode() % capacity 123 % 20 = 3 456 % 20 = 16 |

Now the above modulus operator approach of rehashing has a flaw. What if the hashCode is a negative number? You don’t want a negative index. Hence, an improved hashing formula would be to shift out the sign bit and then calculate the remainder with the modulus (i.e. %) operator.

1 2 3 4 | (123 & 0x7FFFFFFF) % 20 = 3 (456 & 0x7FFFFFFF) % 20 = 16 |

This ensures that you get a positive index value. If you look at the Java 8 HashMap source code, its implementation uses

**a). **Protect against poorer hashes by only extracting the important lower bits.

1 2 3 4 5 6 7 8 9 | static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } |

**b). **Determine the index based on the **hashCode** and **capacity**.

1 2 3 4 5 | static int indexFor(int h, int length) { return h & (length-1); } |

Actual name value pairs are stored as a key/value pair LinkedList

As shown in the diagram, the key/value pairs are stored as linked list. It is important to understand that two different keys can result in the same hashcode say 123, and get stored in the same bucket. For example, “John, 01/01/1956” & “Peter, 01/01/1995” in the above example. So, how do you retrieve just “John, 01/01/1956”? This is where you key class’s **equals() **method gets invoked. It loops through the items in the LinkedList bucket “123” to retrieve the key “John, 01/01/1956” by using the equals() method to find a match. This is why it is important to implement the **hashCode() **& **equals()** method in your key class. If you are using an existing wrapper class like Integer or the String as a key, it is already implemented. If you write your own key class like “MyKey” with name and DOB as the attributes as in “John, 01/01/1956”, you are responsible for properly implementing these methods.

Q5. Why is it a good practice to appropriately set the initial capacity of a HashMap?

A5. To minimize the occurrences of **rehashing**.

Q6. How does a HashSet internally store data?

A6. A HashSet internally uses a HashMap. It stores the element as both key and value.

Q7. What is the effect of implementing a poor hashcode() for an object?

A7. The invocation of hashCode() method should return different values for different objects. If it returns same values for different objects then this results in more key/value pairs getting stored against the same bucket. This **adversely impacts performance** of HashMaps & HashSets.

### Relevant topics

**1)** 5 Java Object class methods interview questions & answers.

**2)** HashMap from scratch Java example

**3)** Stack from scratch Java example

#### Latest posts by Arulkumaran Kumaraswamipillai (see all)

- 15: Spark joins with Dataframes & SQLContext - December 17, 2017
- 14: Spark joins with SQLContext & JavaPairRDD - December 16, 2017
- 13: Spark inner & outer joins in Java with JavaPairRDDs - December 16, 2017
- CAP theorem interview Q&As - December 16, 2017
- 00: ♦ Creating a Tree from a list & flattening it back to a list in Java - December 13, 2017