9 Java Data structures best practices

#1: Choose the right type of Java data structure based on usage patterns

like fixed size or required to grow, duplicates allowed or not, ordering is required to be maintained or not, traversal is forward only or bi-directional, inserts at the end only or any arbitrary position, more inserts or more reads, concurrently accessed or not, modification is allowed or not, homogeneous or heterogeneous collection, etc. Also, keep multi-threading, atomicity, memory usage and performance considerations discussed earlier in mind.

#2: Favor unmodifiable data structures where applicable

Use an immutable collection (aka unmodifiable collection) if you don’t want to allow accidental addition or removal of elements once created. The objects stored in a collection needs to implement its own immutability if required to be prevented from any accidental modifications.

Care should be taken not to unintentionally expose the collection fields to the caller. The caller may not perform any necessary validation.

Bad Approach:

Good Approach:

#3: Thread-safety considerations of the Java data structures

If accessed by a single thread, synchronization is not required, and arrays, lists, sets, stacks, etc can be used as a local variable. If your collections are used as a local variables, the synchronization is a complete overkill, and degrades performance considerably. On the contrary, it is a bad practice to assume that the application is always going to be single threaded and use data structures in a thread unsafe manner, for example declaring it as an instance or static variable. What if the application needs to scale to handle concurrent access from multiple threads?

If accessed by multiple threads, prefer a concurrent collection like a copy-on-write lists and sets, concurrent maps, etc over a synchronized collection for a more optimized concurrent access. Stay away from the legacy classes like Vectors and HashTables. In a multi-threaded environment, some operations may need to be atomic to produce correct results. This may require appropriate synchronizations (i.e. locks). Improper implementation in a multi-threaded environment can cause unexpected behaviors and results.

#4: Performance and memory considerations

The choices you make for a program’s data structures and algorithms affect that program’s memory usage (for data structures) and CPU time (for algorithms that interact with those data structures). Sometimes you discover an inverse relationship between memory usage and CPU time. For example, a one-dimensional array occupies less memory than a doubly linked list that needs to associate links with data items to find each data item’s predecessor and successor. This requires extra memory. In contrast, a one-dimensional array’s insertion/deletion algorithms are slower than a doubly linked list’s equivalent algorithms because inserting a data item into or deleting a data item from a one-dimensional array requires data item movement to expose an empty element for insertion or close an element made empty by deletion. Here are some points to keep in mind.

a) The most important thing to keep in mind is scalability. Assuming that a collection will always be small is a dangerous thing to do, and it is better to assume that it will be big. Don’t just rely on the general theories (e.g. Big-O theory) and rules. Profile your application to identify any potential memory or performance issues for a given platform and configuration in a production or production-like (aka QA) environment.

b) Initialize your collection with an appropriate initial capacity to minimize the number of times it has to grow for lists and sets, and number of times it has to grow and rehash for maps.

c) Favor concurrent structures like ConcurrentHashMap, CopOnWrite Lists/Sets, etc that allow concurrent reads. Use copy-on-write classes and concurrent maps for better scalability. It also prevents ConcurrentModificationException being thrown while preserving thread safety.

#5: Code to interface & use generics

Easier to swap implementation from ArrayList to LinkedList if you code to interface. Generics improve readability.



#6: Properly implement equals( … ) and hashCode( )

methods for the objects stored in the Java data structures. Learn more What are the implications of implementing them incorrectly? — In short, excess debug and rework time. Incorrect implementations of equals( ) and hashCode( ) methods can result in data being lost in HashMaps and Sets. You can also get intermittent data related problems that are harder to consistently reproduce over time.

#7: Use immutable keys

Generally you use java.lang.Integer or java.lang.String class as the key, which are immutable Java objects. If you define your own key class, then it is a best practice to make the key class an immutable object. If you want to insert a new key, then you will always have to instantiate a new object as you cannot modify an immutable object. If the keys were made mutable, you could accidentally modify the key after adding to a HashMap, which can result in you not being able to access the object later on. The object will still be in the HashMap, but you will not be able to retrieve it as you have the wrong key (i.e. a mutated key).

#8: Return zero length collections or arrays

as opposed to returning a null in the context of the fetched list is actually empty. Returning a null instead of a zero length collection is more error prone, since the programmer writing the calling method might forget to handle a return value of null.

#9: If using Java 8, use Stream

the above code is more concise & more readable.

Java & Big Data Interview FAQs

Java Key Areas Interview Q&As

800+ Java Interview Q&As

Java & Big Data Tutorials