06: HashMap & HashSet and how do they internally work? What is a hashing function?

HashMap & HashSet are not only one of the frequently used data structures, but also one of the popular interview topics.

Q1. How does a HashMap store data?
A1. As key/value pairs. You can store and retrieve values with the keys.

Q2. What is the HashMap lookup time in Big O notation?
A2. It is O(n) = O(k * n). On average it is O(1) if the hashCode() method spreads out the buckets as discussed below.

Q3 How does a HashMap internally store data?
A3. A backing array for the buckets and a linked list (until Java 8) and a “binary tree” (Java 8 onwards) to store the values..

Backing Array with buckets: as shown below.

hashCode() returns 1

hashCode() returns 1, 45, etc

1) When you put an object into a map with a key and a value, hashCode() method is implicitly invoked, and hash code value say 123 is returned. Two different keys can return the same hash value. A good hashing algorithm spreads out the numbers. In the above example, let’s assume that (“John”,01/01/1956) key and (“Peter”, 01/01/1995) key return the same hash value of 123.

Java equals vs hashCode

Java equals vs hashCode

2) Now when a hashCode value of say 123 is returned and the initial HashMap capacity is say 10, how does it know which index of the backing array to store it in? This is where the HashMap internally invokes the hash(int ) & indexFor(int h, int length) methods. This is known as the hashing function. This function in a very simple explanation is like

This means the “hashcode = 123” gets stored in index 3 in the backing array. So you get numbers between 0 and 9 for the capacity of 10. Once the HashMap reaches its 75% of the capacity indicated by the default hash factor of 0.75, the backing array’s capacity is doubled and the rehashing takes place to reallocate the buckets for the new capacity of 20.

Now the above modulus operator approach of rehashing has a flaw. What if the hashCode is a negative number? You don’t want a negative index. Hence, an improved hashing formula would be to shift out the sign bit and then calculate the remainder with the modulus (i.e. %) operator.

This ensures that you get a positive index value. If you look at the Java 8 HashMap source code, its implementation uses

a). Protect against poorer hashes by only extracting the important lower bits.

b). Determine the index based on the hashCode and capacity.

Actual name value pairs are stored as a key/value pair LinkedList

As shown in the diagram, the key/value pairs are stored as linked list. It is important to understand that two different keys can result in the same hashcode say 123, and get stored in the same bucket. For example, “John, 01/01/1956” & “Peter, 01/01/1995” in the above example. So, how do you retrieve just “John, 01/01/1956”? This is where you key class’s equals() method gets invoked. It loops through the items in the LinkedList bucket “123” to retrieve the key “John, 01/01/1956” by using the equals() method to find a match. This is why it is important to implement the hashCode() & equals() method in your key class. If you are using an existing wrapper class like Integer or the String as a key, it is already implemented. If you write your own key class like “MyKey” with name and DOB as the attributes as in “John, 01/01/1956”, you are responsible for properly implementing these methods.

Q5. Why is it a good practice to appropriately set the initial capacity of a HashMap?
A5. To minimize the occurrences of rehashing.

Q6. How does a HashSet internally store data?
A6. A HashSet internally uses a HashMap. It stores the element as both key and value.

Q7. What is the effect of implementing a poor hashcode() for an object?
A7. The invocation of hashCode() method should return different values for different objects. If it returns same values for different objects then this results in more key/value pairs getting stored against the same bucket. This adversely impacts performance of HashMaps & HashSets.

Relevant topics

1) 5 Java Object class methods interview questions & answers.

2) HashMap from scratch Java example

3) Stack from scratch Java example

300+ Java Interview FAQs

Java & Big Data Tutorials