18 Java scenarios based interview Q&As for the experienced – Part 1

Q01. Scenario: You need to load stock exchange security codes from a database and cache them for performance. The security codes need to be refreshed say every 30 minutes. This cached data needs to be populated and refreshed by a single writer thread and read by several reader threads. How will you ensure that your read/write solution is scalable and thread safe?

A01. Solution:

Option 1: The java.util.concurrent.locks package provides classes that implement read/write locks where the read lock can be executed in parallel by multiple threads and the write lock can be held by only a single thread. The ReadWriteLock interface maintains a pair of associated locks, one for read-only and one for writing. The readLock( ) may be held simultaneously by multiple reader threads, while the writeLock( ) is exclusive. In general, this implementation improves performance and scalability when compared to the mutex locks (i.e. via synchronized key word) when

1. There are more reads and read duration compared to writes and write duration.

2. It also depends on the machine you are running on — for example, multi-core processors for better parallelism.

Output:

The ConcurrentHashmap is another example where improved performance can be achieved when you have more reads than writes. The ConcurrentHashmap allows concurrent reads and locks only the buckets that are used for modification or insertion of data.

Option 2: Making use of caching frameworks like EHCache, OSCache, etc. Caching frameworks take care of better memory management with LRU(Least Recently Used) and FIFO(First In First Out) eviction strategies, disk overflow, data expiration and many other optional advanced features, as opposed to writing your own.

Q02. Scenario: If you have a requirement to generate online reports or feed files by pulling out millions of historical records from a database, what questions will you ask, and how will you go about designing it?

A02.

Questions you need to ask:

Online Vs Offline? Should we restrict the online reports for only last 12 months of data to minimize the report size and get better performance, and provide report/feed for data older than 12 months via offline processing?

— Should we generate the reports asynchronously? Reports can be generated asynchronously and once ready can be emailed or downloaded via a URL at a later time.

— What report generation framework to use like Jasper Reports, Open CSV, XSL-FO with Apache FOP, etc depending on the required output formats?

— How to handle exceptional scenarios? send an error email, use a monitoring system like Tivoli or Nagios to raise production support tickets on failures, etc?

Security requirements. Are we sending feed/report with PII (i.e. Personally Identifiable Information) data via email? Do we need proper access control to restrict who can generate which online reports? Should we password protect the email attachments?

— Should we schedule the offline reports to run during off peak time?

Archival and purging of the older reports. What is the report retention period for the requirements relating to auditing and compliance purpose? How big are the files?

Solution: An online application with a requirement to produce time consuming reports or a business process (e.g. re-balancing accounts, aggregating hierarchical information, etc) could benefit from making these long running operations asynchronous. Once the reports or the long running business process is completed, the outcome can be communicated to the user via emails or asynchronously refreshing the web page via techniques known as “server push (JEE Async Servlet)” or “client pull (Refresh meta tag)”. A typical example would be

a) A user makes an online request for an aggregate report or a business process like re-balancing his/her portfolios.

b) The user request will be saved to a database table for a separate process to periodically pick it up and process it asynchronously.

c) The user could now continue to perform other functionality of the website without being blocked.

d) A separate process running on the same machine or different machine can periodically scan the table for any entries and produce the necessary reports or execute the relevant business process. This could be a scheduled job that runs once during off-peak or every 10 minutes. This depends on the business requirement.

e) Once the report or the process is completed, notify the user via emails or making the report available online to be downloaded.

Apache Camel can be used to create an asynchronous route. The high-level diagram of a possible solution using the Apache Camel. This framework is written to address the Enterprise Integration Patterns (i.e. EIP).

Apache Camel Routes

Apache Camel Routes

Q03. Scenario: You need to find and change a text from “Client” to “Customer” in 300+ html files.

A03. Solution: Harness the power of Unix & Regex.

sed and awk are very powerful Unix commands for file manipulations.

Q04. Scenario: You have a requirement to maintain a history of insertion, modification, and deletion to the “Customer” table. How will you go about accomplishing this?

A04. Solution

1) Create an ETL (i.e. Extract Transform & Load) job that extracts each change to batch files and send those files to a Data warehouse system, which loads these file data to a history table of SCD type 2 (i.e. Slowly Changing Dimension). SCD Type 2 means maintain each each change. This is discussed in detail at 13 Data Warehouse interview Q&As – Fact Vs Dimension, CDC, SCD, etc.

2) Asynchronously via publish & subscription paradigm. Publish each change as an event to a a message oriented middle-ware like Apache Kafka, Websphere MQ, etc & s separate subscriber application will save each event to a SQL or NoSQL history table.

3) Create database table triggers to insert superseded records to a separate history table. A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. Care must be taken in using or writing triggers as incorrectly written or used triggers can significantly impact performance of your application.

Q05. Scenario: You are asked to design an application, which validates data with 100+ rules to comply with the government compliance requirements and tax laws. These compliance requirements can change and the application need to quickly and easily adapt to changing requirements.

A05. Solution: Harness the power of Rules Engines like Drools. Drools is a popular open source business rules and work flow engine. It helps you externalize the rules in database tables or excel spreadsheets as opposed to embedding within the Java code. The rules are executed in the form of when given a ($condition) then execute the ($consequence). The business will be the custodian of these rules that can be easily viewed on an excel spreadsheet or via querying the database tables. A GUI could be built to maintain these rules that reside in a database.

Q06. Scenario: Reference counting where a shared resource is incremented or decremented. The increment/decrement operations must be thread safe. For example, a counter that keeps track of the number of active logged in users by incrementing the count when users log in and decrementing the count when the users log out. Sometimes you want to allow a finite number of concurrent accesses say 3 users at a time.

A06. Solution:

Mutex: is a single key to an object (E.g. a toilet). One person can have the key and occupy the toilet at the time. When finished, the person gives (or releases) the key to the next person in the queue. In Java, every object has a mutex and only a single thread can get hold of a mutex.

Semaphore: Is a number of free identical toilet keys. For example, having 3 toilets with identical locks and keys. The semaphore count is set to 3 at beginning and then the count is decremented as people are acquiring the key to the toilets. If all toilets are full, i.e. there are no free keys left, the semaphore count is 0. Now, when one person leaves the toilet, semaphore is increased to 1 (one free key), and given to the next person in the queue.

Output:

Q07. Scenario: If you are working with an online trading application, you may want the functionality to queue trades placed after hours and process them when the stock market opens. You also need to asynchronously handle the order statuses sent from the stock exchange like partially-filled, rejected, fully filled, etc, and update the online order information. How will you go about solution this?

A07. Solution: The Message Oriented Middle-wares like Apache Kafka, Rabbit MQ, Websphere MQ, webMethods Broker, etc provide features like guaranteed delivery with store-and-forward mechanism, no duplicates, and transaction management for enterprise level program-to-program communications by sending and receiving messages asynchronously (or synchronously). The diagram below gives a big picture.

Screen shot 2014-08-31 at 11.14.08 AM

When using Message Oriented Middle-wares (MOM) to facilitate asynchronous processing

1) The producer (i.e Trading Engine) that submits user requests and consumer (i.e. FIX Router) that converts the messages to FIX protocol and send FIX messages to the Stock Exchange system retain processing control and do not block. In other words, they continue processing regardless of the state of others. Queue depths need to be properly set, and the messages need to be durable. Message correlation ids are used to pair request and response.

2) MOM creates looser coupling among systems, provides delivery guarantees, prevents message losses, scales well by decoupling performance characteristics of each system, has high availability and does not require same time availability of all sub-systems. So, MOM is ideal for geographically dispersed systems requiring flexibility, scalability, and reliability.

3) You may also require to perform logging, auditing and performance metrics gathering asynchronously and non-intrusively. For example, you could send the log messages from log4j to a queue to be processed later asynchronously by a separate process running on the same machine or a separate machine. The performance metrics can be processed asynchronously as well.

For example, a trading application may have a number of synchronous and asynchronous moving parts and metrics needs to be recorded for various operations like placing a trade on to a queue, receiving asynchronous responses from the stock market, correlating order ids, linking similar order ids, etc. A custom metrics gathering solution can be accomplished by logging the relevant metrics to a database and then running relevant aggregate queries or writing to a file system and then running PERL based text searches to aggregate the results to a “csv” based file to be opened and analyzed in a spreadsheet with graphs. In my view, writing to a database provides a greater flexibility. For example, in Java, the following approach can be used.

Asynchronous logging

Asynchronous logging

— Use log4j JMS appender or a custom JMS appender to send log messages to a queue.

— Use this appender in your application via Aspect Oriented Programming (AOP – e.g Spring AOP, AspectJ, etc) or dynamic proxy classes to non-intrusively log relevant metrics to a queue. It is worth looking at Perf4j and context based logging with MDC (Mapped Diagnostic Contexts) or NDC (Nested Diagnostic Contexts) to log on a per thread basis to correlate or link relevant operations.

— A stand-alone listener application needs to be developed to dequeue the performance metrics messages from the queue and write to a database or a file system for further analysis and reporting purpose. This listener could be written in Java as a JMX service using JMS or via broker service like webMethods, TIBCO, etc.

— Finally, relevant SQL or regular expression based queries can be written to aggregate and report relevant metrics in a customized way.

Q08. Scenario: You are required to change the logic of a module that many other modules have dependency on. How would you go about making the changes without impacting dependent systems.

A08. Solution: You need to firstly perform an impact analysis. Impact analysis is about being able to tell which pieces of code, packages, modules, and projects use given piece of code, packages, modules, and projects, or vice versa is a very difficult thing.

Performing an impact analysis is not a trivial task, and there is not a single tool that can cater for every scenario. You can make use of some static analysis tools like IDEs (e.g. eclipse), JRipples, X-Ray, etc. But, unfortunately applying just static analysis alone not enough, especially in Java and other modern languages whereas lots of things can happen in run time via reflections, dynamic class loading & configuration, polymorphism, byte code injection, proxies, etc.

a) In eclipse Ctrl+Shift+g can be used to search for references

b) You can perform a general “File Search” for keywords on all projects in the work-space.

c) You can use Notepad++ editor and select Search –> Find in files. You can search for a URL or any keyword across a number of files within a folder.

There are instances where you need to perform impact analysis across stored procedures, various services, URLs, environment properties, batch processes, etc. This will require a wider analysis across projects and repositories.

Search within your code repository like GIT:

Tools like FishEye can be used to search across various code repositories. FisheEye is not targeted for any special programming language. It just supports various version control systems and the concept of text files being changed over time by various people. Handy for text searches like environment based properties files to change a URL or host name from A to B.

Grep the Unix/Linux environment where your application is deployed.You can perform a search on the file system where your application(s) are deployed.

Analyze across various log files. It is also not easy to monitor service oriented architectures. You can use tools like Splunk to trace transactions across the IT stack while being tested by the testers to proactively identify any issues related to change. Splunk goes across multiple log files.

Conduct impact analysis sessions across cross functional and system teams and communicate the changes. Brain storm major areas affected and document them. Have a manual test plan that covers the impact systems to be tested. Collaborate with cross functional teams and identify any gaps in your analysis. Have a proper review and sign-off process. Get more developers to do peer code reviews.

Have proper documentation with high level architecture diagrams and dependency graphs where possible. As the number systems grow, so does the complexity. A typical enterprise Java application makes use of different database servers, messaging servers, ERP systems, BPM systems, Work flow systems, SOA architectures, etc. Use online document management systems like Confluence or Wiki , which enables search for various documents.

Q09 to Q10: 18 Java scenarios based interview Q&As for the experienced – Part 2

More Java Scenarios Interview Q&As

1) 11 “In your Java experience” interview questions & answers.

2) 5 Java synchronize & concurrency mgmt scenarios interview Q&As


300+ Java Interview FAQs

800+ Java Interview Q&As

Top