One of the very frequently asked open-ended interview questions is – Can you describe the high-level architecture of the recent application you had worked on from a 100 feet?
You will be asked to 1) draw a diagram on a white board and 2) follow up questions based on your answer. Q1 to Q7 will thoroughly prepare you for this.
Q1. What are the different application integration styles?
A1. Enterprise systems don’t stand-alone. They integrate with many other corporate systems to provide mission critical services to the clients. There are a number of different integration styles like
#1 Shared database
where multiple applications share the same database. This approach is simple, but has disadvantages such as
1) More likely to hit performance & bottlenecks and scalability issues. SQL Databases don’t really scale.
2) Any adjustments (e.g. version upgrade) to the database for one application will have side effects on other applications. Complicates indexing and table partitioning as different applications have different needs.
3) Concurrency & timing issues as there could be chronological dependencies among processes that share the database. What about different applications working on the same tables concurrently? What if one application modifies the data that should have been altered by another application first?
#2 Batch file transfers
Applications will have their own databases, and the data from one application will be copied into another application’s database via an over-night or regular interval batch jobs with an ETL (i.e. Extract Transform & Load) process. Extract from one database and transform them into feed files (e.g. CSV or TAB delimited). These feed files are then loaded or ingested into another application’s database.
Why make copies of the data? An order placement application will require customer or product information. Only one application will be the source of truth and can create or modify the data, and the other applications will use the data as read only.
A typical industrial example would be that a Java & Spring based online ordering application requiring data from a legacy mainframe system. You can also bring in data from external organizations.
Disadvantages of ETL are:
1) Not ideal for near real-time (i.e. NRT) or on-demand data access that requires fast response.
2) Not only takes lots of considerations & time to develop, but also difficult to keep up with changing requirements.
#3 Invoking remote procedures (RPC calls)
RPC is an inter-process communication that allows one program to directly call procedures in another program either on the same machine or another machine on the network. Spring supports inter-process communication (aka remoting) via Web Services JAX-WS (i.e SOAP) and JAX-RS (i.e. RESTful), which are successors of JAX-RPC, RMI, Spring’s HTTP Invoker, Hessian, and Burlap. RESTful is more popular and ubiquitous.
For example: JSON or XML data can be exchanged between App1 & App2. App1 and App2 could be implemented in different languages as RESTful web services as they are language neutral.
App1 (REST client aka consumer) –> JSON –> App2 (REST service provider)
#4 Messaging via MOMs (i.e. Message Oriented Middlewares)
This is also an inter-process communication (aka remoting) whereby exchanging messages asynchronously over a message oriented middleware (MOM). MOM is useful for asynchronous request-reply or publish-subscribe messaging because a request may take a long time to complete or several parties may be interested in the actual message.
Spring supports both JMS and AMQP. [ Asynchronous processing in Java real life examples.]
A real world example is that of a web application that is used to place an order or a trade for a particular customer. Your application taking orders on-line will save the order to a database & publish a message onto a JMS queue. Another application listening to the queue will respond to the event by taking the order and then placing that order with another third party system (e.g. a trading system). The third application may be responsible for taking the order and sending emails on order statuses such as confirmation and completion. All these events take place asynchronously. [ Asynchronous processing in Java real life examples]
Q2. How does a Java EE application integrate with other systems?
A2. Using various protocols like HTTP(S), SOAP, RMI, FTP, TCP, FIX, proprietary protocols, etc, and message exchange formats such as JSON, XML, text (e.g. CSV, TAB delimited), etc.
1) XML or JSON over HTTP(S). RESTFul uses JAX-RS. RESTful Web Service is the most prevalent integration style as it is easy to implement, built for the web and make use of the HTTP caching for performance. JSON is the most popular message exchange format due to its conciseness.
2) SOAP over HTTP(s) where SOAP is a specialization of XML. SOAP uses JAX-WS. SOAP is best when transaction security and integrity are of highest priority. For example, integrating with external businesses or with older systems that may only support SOAP.
3) Traditionally, message-oriented middle ware products have used proprietary protocols for communication between client applications and brokers. This means that once you’ve selected a particular vendor’s messaging broker, you must use that vendor’s libraries to connect your client applications to that broker. This tightly couples your code with a particular vendor. Advanced Message Queuing Protocol (AMQP) is an efficient, reliable, wire-level messaging protocol that you can use to build robust and cross-platform messaging applications.
JMS is an interface, which alleviates tightly coupling of your code with a traditional MOM provider in Java. JMS allows you to switch from one JMS compliant message broker (e.g. Web Methods) with another one (e.g. WebspehreMQ) with little or no changes to your source code. It is like JDBC, which allows you to switch underlying databases.
4) JavaMail for sending emails and Simplewire Java SMS library to send SMSs.
5) Overnight batch job runs to load data feeds with Spring batch or the new JEE batch jobs. These are ETL (Extract Transform and Load) tasks. You use Hadoop to ingest big data via ELT (Extract Load & Transform);
6) Using open source integration frameworks like Spring Integration, Mule ESB, Apache Camel, etc. This helps you integrate systems in a standardised way adhering to the enterprise integration patterns (EIP). Apache Camel is a light weight integration framework that allows you to use HTTP, FTP, JMS, EJB, JPA, RMI, JMS, JMX, LDAP, and Netty to name a few.
7) Using an ESB (Enterprise Service Bus) to integrate your applications. For example, Oracle Service Bus, TIBCO ESB, webMethods, Mule ESB, etc. Under the hood, the ESB also uses an integration framework and provide more services and management functionalities like monitoring, high availability, clustering, graphical user inteface for routing and configuring, etc. Usually, an ESB is a complex and powerful product with a higher learning curve. Suited for very large integration projects. Projects requiring BPM (Business Process Managemnt) integration and other integrated services like monitoring, clustering, etc. Mule does provide proprietary connector support for systems like SAP, Tibco Rendevous, PayPal, Sibel CRM, IBM’s CICS, etc.
8) TCP based socket level integration. MINA is a popular framework for TCP based non blocking socket level communication. MINA is based on EDA (Event Driven Architecture). In EDA, Both the “Event” producers and listeners are loosely coupled via an “EventHub” and “Event”. An “EventHub” is used to register and unregister listeners.
Akka is a higher level framework for building event-driven, scalable, fault-tolerant applications. Akka uses reactive programming with its Actor model. Reactive Programming or Reactor pattern (RP) in Java Interview Q&As | Simple Akka tutorial in Java step by step
RxJava is a reactive programming library for composing asynchronous and event-based programs by using observable sequences. It is a library with rich set of Functional Programming operations that let you transform, combine, split and compose data sources.
What is reactive programming?
Reactive programming is all about asynchronous data flows with principles such as 1) being responsive to react to user requests even under load 2) Resilient & scalable 3) message driven, which is the foundation for writing scalable, resilient, and responsive systems.
Reactive Programs are of 2 types:
1) Event-driven concurrency: E.g. RxJava. This is based on events, which are monitored by zero or more observers. The big difference between event-driven style and imperative style is that the caller does not block and hold onto a thread while waiting for a response.
2) Message-driven concurrency: E.g. Akka. Actor based where the messages are sent to an Actor via a mailbox. Actors can pass messages back and forth, or even pass messages to them selves. Apache Spark is a fast and general execution engine for large-scale data developed on the “Actor Model“. Scala Async and Actor System Interview Q&As.
Spray is an open-source asynchronous & actor based toolkit for building REST/HTTP-based integration layers on top of Scala and Akka. It’s a great way to integrate your Scala applications.
Netty has NIO at its core, and works at a lower level than Akka. More at a networking level by supporting TCP, UDP, HTTP, FTP, SSL, etc. Akka abstracts out the networking level for you to focus on the problem domain.
9) Invoking remote procedures via RMI, Burlap, and Hessian. Burlap/Hessian remote objects are just ordinary Java objects that implement some interfaces. They don’t require special proxy, home, or remote classes. One of the inherent benefits of this object-and-interface model is that it promotes the good object-oriented design practice of design by interface.
10) FIX protocol to exchange financial information. FIX stands for Financial Information eXchange, which is an open protocol intended to streamline electronic communications in the financial securities industry. Most of the exchanges use this standard for communication like sending Order, Executions, MarketData, etc. QuickFIX/J and CameronFIX are popular FIX frameworks for Java.
11) Integration with data-warehouse systems for multi-dimensional reporting. OLTP (OnLine Transaction Processing) data is summarised and sent to OLAP (OnLine Analytical Processing) systems for business intelligence, data mining, and complex reporting. IBM Cognos, JasperSoft, Oracle Enterprise BI server, etc are OLAP systems. Scalable Straight Through Processing System (OLTP) vs OLAP in Java
12) Server side and client side mashups. Merging of services and content from multiple web sites in an integrated and coherent way is called a mashup.
A server-side mash-up integrates content in the server and pass it to the client. Hence this style of mash-up is also called a proxy-style mash up because the server acts as a proxy.
Q3. Can you discuss some of the high level architectures you are experienced with?
Q4. In your experience, what are some of the common architectural and development mistakes?
Q5. What causes performance issues in Java?
Q6. In your experience, what are some of the key security considerations in an enterprise Java application?
Q7. Can you list some key software design principles?
More white board session links
4) Know how to draw ERDs.
Latest posts by Arulkumaran Kumaraswamipillai (see all)
- 15: Spark joins with Dataframes & SQLContext - December 17, 2017
- 14: Spark joins with SQLContext & JavaPairRDD - December 16, 2017
- 13: Spark inner & outer joins in Java with JavaPairRDDs - December 16, 2017
- CAP theorem interview Q&As - December 16, 2017
- 00: ♦ Creating a Tree from a list & flattening it back to a list in Java - December 13, 2017