01: 9 Java low latency interview questions & answers

Have you seen job advertisements requiring Java candidates to work in low latency, high throughput, real-time and distributed systems with share-nothing architecture? Wondering what questions you will be asked? If you are an experienced Java developer targeting high paying skills then it pays to get a good handle on Java low latency interview questions & answers.

You will be quizzed on the low latency application you had recently worked on especially the outcomes in terms of the latencies, response times, and throughput along with the challenges you faced.

Q1. What do you understand by the term latency?
A1. Latency is the time required to perform some action or to produce some result. Latency is measured in units of time like seconds, milli seconds, micro seconds, nanoseconds, etc. What defines a “low” latency depends on the context – low latency over the internet might be 200ms whereas low latency in a trading application (e.g. pricing or order matching engines) using FIX or custom protocols over TCP/IP might be 2µs. Trading systems need to target 100 nano-seconds to 100 ms.

Q2. What do you understand by the term throughput?
A2. Throughput is the number of such actions executed or results produced per unit of time. This is measured in units of time like requests per second. The term “memory bandwidth” is sometimes used to specify the throughput of memory systems.

Example: Hadoop

Hadoop master node services

Hadoop Distributed File System (HDFS) & Apche Spark/Map reduce are about throughput. Each slave or worker node will have both data storage & compute capabilities. So, the same throughput can be achieved by processing a 100 GB file across a 20 slave node cluster in 2 minutes or processing a 1 TB file across a 100 slave node cluster in same 2 minutes. HDFS (i.e. Hadoop Distributed File System) scales out very well and uses cheaper commodity hardware. You can start with 20 nodes and then horizontally scale to 100+ nodes to analyse terabytes of data.

Latency Vs Throughput

It is often a trade-off between latency (i.e. how soon a record can be processed) and throughput (i.e. number of records processed/second). Stream processors like Apache Kafka, Amazon Kinesis, Apache storm, Apache Flink, low-latency Continuous Processing Mode in Structured Streaming in Apache Spark 2.3, etc are optimised for latency and batch processors like Apache Spark are optimised for throughput. Latency should be as minimum as possible whilst throughput should be as high as possible. It is difficult to achieve both at the same time but strive for both to find a good balance to meet the SLAs (i.e. Service Level Agreements).

Some insights like fraud detection have much higher value shortly after they have happened and those values diminish very fast with time. Stream processing targets such scenarios, and can provide insights faster, often within milliseconds to seconds.

Example: Stream Processing

Apache Kafka – Kappa Architecture (Source: https://www.cloudywithachanceofbigdata.com/the-streaming-data-warehouse-kappa-architecture-and-data-warehousing-re-imagined/)

Stream processing naturally fit with time series data and detecting patterns over time. For example, fraud detection by filtering suspicious records, system alerts/alarms, IoT sensor data, user web sessions, etc. If you are trying to detect the length of a web session in a never-ending stream it is very hard to do it with batches as some session will fall into two batches. Streaming handles never-ending data streams gracefully and naturally. If processing needs multiple passes through full data or have random access then it is tricky to use streaming.

Q. How do you achieve stream processing in JVM languages like Java, Scala, etc?
A. You can build your own application with a Message Oriented Middleware (i.e. MOM) like Websphere MQ, ActiveMQ, RabbitMQ, Kafka, etc where you write code to receive events from topics in the broker (i.e. event streams), compute the results and then publish results back to the broker. Alternatively, and preferably you can use a stream processing framework like Apache storm, Apache Flink, Apache Spark streaming, etc to save time. The framework will do the heavy lifting by collecting data, delivering it to each processor, making sure they run in the right order, collecting results, scaling across nodes if the load is high, and handling failures by retrying.

Q3. What is the difference between the terms latency and response times?”
A3. Latency is the time elapsed between when a request was sent to the server and when the first byte of the response has started to be received.

Response time is the time elapsed between when a request was sent to the server and when the response has been fully received. In a web application the browser needs to load the assets like the DOM tree, images, CSS, and the JavaScript scripts.

So, the response time will always be >= latency. In other words,

Low latency is a sum of many things, and two most important ones are:

1. Network Latency, which is the time taken on the network to send/receive a message/event &

2. Processing Latency, which is the time taken by your application to act on a message/event.

If you are building a “trade order matching” engine in Java, the “network latency” is the time taken in say micro seconds to receive an order matching request to the engine from a client app plus the time taken for the client app to receive the first byte of the response message from the engine. The “processing latency” is the time elapsed in micro or milli seconds for the engine to match the order and build the response to be sent back to the client app.

Q4. What latency will you be targeting for your applications?
A4 It depends on the context of the application. For example,

#1. Trading system placing buy/sell equity or FX orders to the market will target a latency of under 20ms.

#2. A standard web application will target a latency of 200ms to 800ms.

#3. A gaming application or a more complex web application will target a latency of 500ms to 1000ms.

Example 1: An EFTPOS system

EFTPOS Latencies example

EFTPOS Latencies example

Example 2: An Online Trading System

Latency: indusatrial strength example

Latency: industrial strength example

Q5. How will you go about improving the latency for a more complex web site?
A5.

#1. Processing the requests asynchronously by submitting to a queue and getting the results later on via a client pull or server push.

#2. Reducing the complexity of the page by dividing the tasks with the view of better user experience. This also means smaller payloads transferred between the clients & the servers for better network latency.

#3 Producing less garbage by violating the OO concepts by favoring primitive data types, and applying the flyweight design pattern to improve reuse.

#4. Profiling the application with the tools like VisualVM to identify and improve the bottlenecks in terms of CPU, memory usage, Garbage Collection pauses, etc.

#5. 15 key considerations to write low latency applications in Java.

Q6. Is a latency of over 20ms considered fast or slow in HFT (High Frequency Trading) application?”
A6. Anything over 20ms will be considered slow. The HFT trades are conducted using algorithms to buy, sell, and match huge volume of trades. These are ultra low latency applications once used to be written in “C”, and now a days increasingly being written in Java.

Q7. What throughput will you be aiming for in HFT (High Frequency Trading) applications?”
A7. 50k to 200k orders or transactions per second. You will have multiple servers to process the requests. The architecture needs to be scalable to cater for growing demands. You can learn more at Scalability interview questions & answers

Q8. What do you understand by the terms real-time systems, latency, and scalability?
A8. Real-time and low-latency are distinctly separate subjects although often related. Real-time is about being more predictable than fast. Low latency systems need to be fast to meet SLAs (Service Level Acceptances) in sub milliseconds (e.g. micro seconds).

Scalability means the ability of the system to handle growing demands by adding more CPUs, memory, nodes, servers, etc. Scalability interview questions & answers

Q09. What are some of the considerations in writing low latency applications in Java?
A09.

1) Parallel computing via a) multi-threading b) Non-blocking I/O (E.g. MINA, Netty, Grizzly, etc) c) distributed systems (E.g. Apache Kafka, Apache Spark, etc) with share nothing architectures. Share-nothing architecture is where applications will be running in parallel on 100+ nodes with its dedicated CPU, memory, I/O, etc.

2) Streaming APIs like Apache Spark streaming, Apache Storm, StAX (i.e. Streaming API for XML) to process data in real-time or near real time.

3) Writing concurrent programs with Java multi-threading features such as executors, futures, completable futures, fork/join, concurrent data structures, etc.

4) Understanding the Java memory model & tuning memory & garbage collection in Java.

5) Using event based and non-blocking paradigms. For example, using frameworks like Apache MINA, Netty, Grizzly, and Akka.

6) MINA & Netty are lower level frameworks than Akka, and have NIO (New Java IO) as its core. NIO is an event driven non blocking paradigm.

7) Akka is a higher level general purpose framework compared to MINA & Netty for building event-driven, scalable, and fault-tolerant applications. Akka is written in Scala, with language bindings provided for both Scala and Java. Akka uses the Actor model to hide all the thread-related code and gives you really simple and helpful interfaces to easily implement a scalable and fault-tolerant system.

8) Even though you need to have a good handle on writing concurrent programs in Java & interviewers like to quiz/test you on it, favor a framework like Akka as writing complex concurrent programs is not a trivial task, and you need to deal with threads, locks, race conditions & debugging. Writing concurrent programs without frameworks can be error-prone and can lead to code that is difficult to read, test, and maintain.

9) If you are working in the BigData space, have a look at Apache Spark, which is based on Scala & Akka Toolkit. Here is an example of a Spark master “Driver Application” creating tasks & scheduling them to be run on the “Spark Executors”. Only two executors are shown here, but typical clusters will have 100+ nodes & executors.

SparkContext with Executors executing tasks

SparkContext with Executors executing tasks

Executors are worker nodes’ processes in charge of running individual tasks in a given Spark job. Spark Executors are launched at the beginning of a Spark application and typically run for the entire lifetime of an application. Once they have finished running the tasks they send the results to the “Driver Application”. “Spark Executors” also provide in-memory storage for RDDs that are cached.

Q10. What is an actor model in Akka toolset, which is also known as the reactor design pattern?
A10. “actor model” is a design pattern for writing concurrent and scalable code that runs on distributed systems. This is an event driven (i.e. message passing) model that involves sending & receiving events among actors.

Scala Akka Order Placement Example

1) Instead of invoking an object directly, you construct a message and send it to a destination object called an actor.

2) Each thread is an actor with a specific job to do. The actor engine stores the message in a queue.

3) When a thread becomes available, the actor engine running the actor delivers that message to its destination actor object.

4) When the actor completes its task, it sends a message back to the originating object, which is also considered an actor.

5) You can orchestrate which messages get passed to which actors under what conditions.

The akka-camel module allows “Untyped Actors” to receive and send messages over a great variety of protocols such as HTTP, SOAP, TCP, FTP, SMTP or JMS and APIs such as java & Scala .

Q11. What do you understand by the term real-time application (RTA)?
A11. Real-time application (aka RTA) is an application where the content is pushed through “as it happens” within a specified time frame. These time frames are defined as SLAs (Service Level Agreements). For example, in a Straight-Through Processing (STP) solution, you have real-time trades flow between your front/middle office, and traders/stock exchange.

Real-time systems can be further divided into 1) Hard real-time & 2) soft real-time.

Hard real-time is when an action is performed at the wrong time will have possibly no or negative effect. In other words, you must absolutely hit every deadline. It is not acceptable to say 90% of the time we hit the response time of 100ms. Only some systems have this requirement -> medical apps (e.g. pacemaker), defense systems, nuclear systems, avionics, etc.

Soft real-time is when an action is performed either too early or too late will still have a positive effect. If it had performed the task on time, it would have had greater value in terms of better customer experience, meeting the SLAs, etc. Soft real-time system can be a trading application with high through-put & low latency without any hard response time guarantees. No catastrophe happens when response times fail say 5% of time when 100K requests are sent. Most system fall into this category like financial applications (e.g. placing trades, matching trades, etc), event processing, telecom, etc.

Q. Can JVM be used for real-time applications in the sense that it’s guaranteed to react within a certain amount of time?
A. The answer is no for the standard JVMs, but the special JVMs that support “Real-Time Specification for Java (RTSJ)” extensions can process in hard real-time. Standard JVMs achieve “soft real-time” mainly due to automatic garbage collection and GC pauses associated with it. The RTSJ provides a subclass of RTT (i.e. Real-Time-Thread) called NoHeapRealtimeThread (NHRT). Instances of this subclass are protected from GC induced pauses. NHRTs are NOT allowed to use the heap. NHRTs use the scoped memory and immortal memory features to allocate memory on a more predictable basis.

Q. Do you favour hard or soft real-time Java development guidelines in general?
A. soft real-time is favoured unless there is a specific need for hard real-time as soft real-time offers much better developer productivity & application maintenance.

Wondering what to learn or brush up on?

You need to have a good grasp of multi-threading, Java Memory Model, GC, Profiling, Non-blocking I/O (aka NIO), Big-O notation, lock free data structures like concurrent lists/maps, and strategies to produce less garbage. Here are tutorials & Q&As to build your skills to develop LOW LATENCY applications in Java.

1. 15 key considerations to write low latency applications in Java.

2. Reactive Programming (RP) in Java Interview Q&A.

2. Java GC tuning for low latency applications

4. Capture throughput & latencies with “Metrics Core” tutorial

5. Understanding Big O notations through Java examples

6. Java primitives & objects – memory consumption interview Q&A

7. ExecutorService Vs Fork/Join & Future Vs CompletableFuture Interview Q&As

8. Home assignment – Create a simple framework where work items can be submitted

9. Simple Akka tutorial in Java step by step.

Better throughput on BigData

1. Hadoop overview & architecture interview Q&As

2. Apache Spark interview Q&As

3. Apache Spark Tutorials


800+ Q&As with code, diagrams, [Big] Data, low-latency & 16+ key areas to not only standout from your competition, but also to fast-track your career .

Prepare multiple targeted CVs promoting yourselves as a Java/ETL Developer, Java/Data Solution designer, Java/Data Architect, Big Data Engineer, low-latency developer, Spark Developer, Data Analyst, etc.

Top