Have you seen job advertisements requiring Java candidates to work in low latency, high throughput, real-time and distributed systems with share-nothing architectures? Wondering what questions you will be asked? If you are an experienced Java developer targeting high paying skills then it pays to get a good handle on Java low latency interview questions & answers.
You will be quizzed on the low latency application you had recently worked on especially the outcomes in terms of the latencies, response times, and throughput along with the challenges you faced.
Q1. What do you understand by the term latency?
A1. Latency is the time required to perform some action or to produce some result. Latency is measured in units of time like seconds, milli seconds, micro seconds, nanoseconds, etc.
What defines a “low” latency depends on the context
Low latency over the internet might be 200ms whereas low latency in a trading application (e.g. pricing or order matching engines) using FIX (i.e. Financial Information eXchange ) or proprietary protocols over TCP/IP might be 50µs to 100µs.
Q2. What do you understand by the term throughput?
A2. Throughput is the number of such actions executed or results produced per unit of time. This is measured in units of time like requests per second. The term “memory bandwidth” is sometimes used to specify the throughput of memory systems.
Apache Hadoop & Spark for throughput
Hadoop Distributed File System (HDFS) & Apche Spark are about throughput. Each slave or worker node will have both data storage & compute capabilities. So, the same throughput can be achieved by
1) Processing a 100 GB file across a 20 slave node cluster in 2 minutes Vs.
2) Processing a 1 TB file across a 100 slave node cluster in the same 2 minutes.
HDFS (i.e. Hadoop Distributed File System) horizontally scales out very well and uses cheaper commodity hardware. You can start with 20 nodes and then horizontally scale to 1000s of nodes to analyse terabytes of data.
Optimising for Latency Vs Throughput
It is often a trade-off between latency (i.e. how soon a record can be processed) and throughput (i.e. number of records processed in 1 minute).
Stream processors like Amazon Kinesis, Apache storm, Apache Flink, Apache Spark Streaming, etc are optimised for latency (i.e. average speed a message is processed after it is put on the queue) and,
Batch processors like Apache Spark, Spring batch, etc are optimised for throughput (i.e. volume of messages that can be processed within a set period of time).
Apache Kafka use distributed commit log technologies to play a similar role to the traditional broker message systems, but Apache Kafka is optimised towards different use cases however, instead of concentrating on flexibility and delivery guarantees, it tend to be concentrated on scalability and throughput.
Latency should be as minimum as possible whilst throughput should be as high as possible. It is difficult to achieve both at the same time but strive for both to find a good balance to meet the SLAs (i.e. Service Level Agreements). It is worth noting that you can sometimes sacrifice latency to get throughput by batching things together, and conversely improving latency by sending data immediately after it is created by sacrificing throughput.
Some insights like fraud detection have much higher value shortly after they have happened and those values diminish very fast with time. Stream processing targets such scenarios, and can provide insights faster, often within milliseconds to seconds.
CPU for low-latency Vs GPU for throughput
CPU (i.e. Central Processing Unit) is optimised for latency & GPU (i.e. Graphics Processing Unit) is optimised for throughput.
CPUs with smaller number of more powerful cores for getting jobs with serial tasks done quickly.
GPUs with thousands of less powerful cores for parallel processing. GPUs are more suited for graphics processing & deep learning where training a model can take days if not weeks. GPUs break complex problems into thousands or millions of separate tasks and work them out at once.
NOTE: Google has come up with TPUs (i.e. Tensor Processing Unit) optimised for deep learning & faster than GPUs.
Example: Stream Processing
Stream processing naturally fit with time series data and detecting patterns over time. For example, fraud detection by filtering suspicious records, system alerts/alarms, IoT sensor data, user web sessions, etc. If you are trying to detect the length of a web session in a never-ending stream it is very hard to do it with batches as some sessions will fall into two batches. Streaming handles never-ending data streams gracefully and naturally. If processing needs multiple passes through full data or have random access then it is tricky to use streaming.
Q. How do you achieve stream processing in JVM languages like Java, Scala, etc?
A. You can
1) Build your own application in Java: with a Message Oriented Middleware (i.e. MOM) like Websphere MQ, ActiveMQ, RabbitMQ, Kafka, etc where you write code to asynchronously receive events via topics (i.e. event streams), compute the results and then publish results back to another topic.
2) Alternatively, and preferably you can use a stream processing framework like Apache storm, Apache Flink, Apache Spark streaming, etc to save time. The framework will asynchronously receive the streamed events via a Message Oriented Middleware like Apache Kafka or Amazon Kinesis, and do the heavy lifting by collecting data, delivering it to each processor, making sure they run in the right order, collecting results, scaling across nodes if the load is high, and handling failures by retrying.
Q3. What is the difference between the terms latency and response times?”
A3. Latency is the time elapsed between when a request was sent to the server and when the first byte of the response has started to be received.
So, the response time will always be >= latency. In other words,
Response time = Latency (e.g. to the server, server processing, back to client with first byte) + Client side processing time (e.g. render the browser)
Low latency is a sum of many things, and two most important ones are:
1. Network Latency, which is the time taken on the network to send/receive a message/event &
2. Processing Latency, which is the time taken by your application to act on a message/event.
If you are building a “trade order matching” engine in Java, the “network latency” is the time taken in say micro seconds to receive an order matching request to the engine from a client app plus the time taken for the client app to receive the first byte of the response message from the engine. The “processing latency” is the time elapsed in micro or milli seconds for the engine to match the order and build the response to be sent back to the client app.
Q4. What latency will you be targeting for your applications?
A4 It depends on the context of the application. For example,
#1. Trading system placing buy/sell equity or FX orders to the market will target a latency of under 20ms.
#2. A standard web application will target a latency of 200ms to 800ms.
#3. A gaming application or a more complex web application will target a latency of 500ms to 1000ms.
Example 1: An EFTPOS system
Example 2: An Online Trading System
Q5. How will you go about improving the latency for a more complex web site?
A5. 15 key considerations to write low latency applications in Java.
Q6. Is a latency of over 20ms considered fast or slow in HFT (High Frequency Trading) application?”
A6. Anything over 20ms will be considered slow. The HFT trades are conducted using algorithms to buy, sell, and match huge volume of trades. These are ultra low latency applications once used to be written in “C”, and now a days increasingly being written in Java.
Q7. What throughput will you be aiming for in HFT (High Frequency Trading) applications?”
A7. 50k to 200k orders or transactions per second. You will have multiple servers to process the requests. The architecture needs to be scalable to cater for growing demands. You can learn more at Scalability interview questions & answers
Q8. What do you understand by the terms real-time systems, latency, and scalability?
A8. Real-time and low-latency are distinctly separate subjects although often related. Real-time is about being more predictable than fast. Low latency systems need to be fast to meet SLAs (i.e. Service Level Acceptances) in sub milliseconds.
Systems built for throughput need to efficiently use disk & CPU. Systems built for latency need to deliver the messages/events as they occur in real-time without batching. Whilst message oriented middle-wares like RabitMQ delivers messages at very low latencies (i.e. makes use of memory) at lower throughputs, Apache Kafka delivers the best throughput (i.e. makes use if Disk) at lowest end to end latencies.
Design principles to achieve low latency at high throughput
Kafka achieves low latencies due its design principles such as Sequential I/O with append-only totally ordered data structure, Don’t Copy by using binary data across producers, consumers & brokers without any modifications & Zero Copy Principle to reduce CPU cycles & memory bandwidth, Batching of data to reduce network calls, and Compression of batches (and not individual messages) using LZ4, SNAPPYor GZIP codecs, horizontal scaling by adding more nodes.
Scalability means the ability of the system to handle growing demands by adding more CPUs, memory, disks, nodes, etc. Scalability interview questions & answers
Q09. What are some of the considerations in writing low latency applications in Java?
1) Parallel computing via a) multi-threading b) Non-blocking I/O (E.g. MINA, Netty, Grizzly, etc) c) distributed systems (E.g. Apache Kafka, Apache Spark, etc) with share nothing architectures. Share-nothing architecture is where applications will be running in parallel on 100+ nodes with its dedicated CPU, memory, I/O, etc.
2) Streaming APIs like Apache Spark streaming, Apache Storm, StAX (i.e. Streaming API for XML) to process data in real-time or near real time.
3) Writing concurrent programs with Java multi-threading features such as executors, futures, completable futures, fork/join, concurrent data structures, etc.
4) Understanding the Java memory model and tuning memory and garbage collection in JVM.
5) Using event based and non-blocking paradigms. For example, using frameworks like Apache MINA, Netty, Grizzly, and Akka.
6) MINA & Netty are lower level frameworks than Akka, and have NIO (New Java IO) as its core. NIO is an event driven non blocking paradigm.
7) Akka is a higher level general purpose framework compared to MINA & Netty for building event-driven, scalable, and fault-tolerant applications. Akka is written in Scala, with language bindings provided for both Scala and Java. Akka uses the Actor model to hide all the thread-related code and gives you really simple and helpful interfaces to easily implement a scalable and fault-tolerant system.
8) Even though you need to have a good handle on writing concurrent programs in Java & interviewers like to quiz/test you on it, favor a framework like Akka as writing complex concurrent programs is not a trivial task, and you need to deal with threads, locks, race conditions & debugging. Writing concurrent programs without frameworks can be error-prone and can lead to code that is difficult to read, test, and maintain.
9) If you are working in the BigData space, have a look at Apache Spark, which is based on Scala & Akka Toolkit. Here is an example of a Spark master “Driver Application” creating tasks & scheduling them to be run on the “Spark Executors”. Only two executors are shown here, but typical clusters will have 100+ nodes & executors.
Executors are worker nodes’ processes in charge of running individual tasks in a given Spark job. Spark Executors are launched at the beginning of a Spark application and typically run for the entire lifetime of an application. Once they have finished running the tasks they send the results to the “Driver Application”. “Spark Executors” also provide in-memory storage for RDDs that are cached.
Q10. What is an actor model in Akka toolset, which is also known as the reactor design pattern?
A10. “actor model” is a design pattern for writing concurrent and scalable code that runs on distributed systems. This is an event driven (i.e. message passing) model that involves sending & receiving events among actors.
1) Instead of invoking an object directly, you construct a message and send it to a destination object called an actor.
2) Each thread is an actor with a specific job to do. The actor engine stores the message in a queue.
3) When a thread becomes available, the actor engine running the actor delivers that message to its destination actor object.
4) When the actor completes its task, it sends a message back to the originating object, which is also considered an actor.
5) You can orchestrate which messages get passed to which actors under what conditions.
The akka-camel module allows “Untyped Actors” to receive and send messages over a great variety of protocols such as HTTP, SOAP, TCP, FTP, SMTP or JMS and APIs such as java & Scala .
Q11. What do you understand by the term real-time application (RTA)?
A11. Real-time application (aka RTA) is an application where the content is pushed through “as it happens” within a specified time frame. These time frames are defined as SLAs (Service Level Agreements). For example, in a Straight-Through Processing (STP) solution, you have real-time trades flow between your front/middle office, and traders/stock exchange.
Real-time systems can be further divided into 1) Hard real-time & 2) soft real-time.
Hard real-time is when an action is performed at the wrong time will have possibly no or negative effect. In other words, you must absolutely hit every deadline. It is not acceptable to say 90% of the time we hit the response time of 100ms. Only some systems have this requirement -> medical apps (e.g. pacemaker), defense systems, nuclear systems, avionics, etc.
Soft real-time is when an action is performed either too early or too late will still have a positive effect. If it had performed the task on time, it would have had greater value in terms of better customer experience, meeting the SLAs, etc. Soft real-time system can be a trading application with high through-put & low latency without any hard response time guarantees. No catastrophe happens when response times fail say 5% of time when 100K requests are sent. Most system fall into this category like financial applications (e.g. placing trades, matching trades, etc), event processing, telecom, etc.
Q. Can JVM be used for real-time applications in the sense that it’s guaranteed to react within a certain amount of time?
A. The answer is no for the standard JVMs, but the special JVMs that support “Real-Time Specification for Java (RTSJ)” extensions can process in hard real-time. Standard JVMs achieve “soft real-time” mainly due to automatic garbage collection and GC pauses associated with it. The RTSJ provides a subclass of RTT (i.e. Real-Time-Thread) called NoHeapRealtimeThread (NHRT). Instances of this subclass are protected from GC induced pauses. NHRTs are NOT allowed to use the heap. NHRTs use the scoped memory and immortal memory features to allocate memory on a more predictable basis.
Q. Do you favour hard or soft real-time Java development guidelines in general?
A. soft real-time is favoured unless there is a specific need for hard real-time as soft real-time offers much better developer productivity & application maintenance.
More on Java Low latency interview Q&As