15+ SQL scenarios based interview questions answered

Q1 How will you go about identifying duplicate records in a table?
A1 The following SQL query will do the trick.

Note: Interviewer is testing if you understand that the aggregate queries need to have a “GROUP BY” clause on columns that are NOT aggregated. E.g. code and user_name.

Q2 How would you go about deleting the duplicate records?
A2 You could do it in a number of steps as shown below.

  • Create a temporary table.
  • Insert the unique records into the temporary table.
  • Drop the original table.
  • Rename the temp table to original table name.

DISTINCT keyword

Example based on Teradata.

Window function

Example based on Teradata. ROW_NUMBER() is a window function that assigns a sequential integer to each row within the PARTITION BY of a result set. In the below example, the rows are partitioned by name, designation, and dob. QUALIFY is like a WHERE clause for windowed records, and will filter out any duplicated records. Learn more in detail at SQL window function for analytics & qualify interview Q&As.

Window function & CTE

With databases that support CTE (i.e. Common Table Expression). Learn more in detail at Common Table Expressions (i.e. CTE) in SQL using the “WITH” clause.

GROUP BY & max(..)

Using the GROUP BY & MAX() without using CTE and ROW_NUMBER().

Q3 When you have a table that maintains history of record updates, how will you go about retrieving the latest records?

A3 The CDC (i.e. Change Data Capture) & SCD (i.e. Slowly Changing Dimension) concepts are must know for Data Engineers & Data Analysts. These are discussed in detail at 10 Data warehouse interview Q&As.

You can use a

#1 Subquery

#2 Inner Join

#3 Window function

This is discussed in detail at SQL retrieving the latest records interview Q&As

Q4 How will you go about finding…”Which customers didn’t place an order in August”?
A4 Use an ANTI-JOIN, which is a join between two tables that returns rows from the first table where no matches are found in the second table.

1)LEFT JOIN” – Lists all customers with non-null cust_ids for those who placed orders in “Aug 2020”, with null cust_ids for those who did not place orders.

2) So “where o.cust_id is NULL” will give all the customers who did not place orders in Aug 2020.

Similar approach can be used for scenarios like:

a) Which customers haven’t visited my website this year?
b) Which real-estate agents didn’t close a deal last week?
c) Which products are not sold in the last quarter?
d) and so on …….

Learn more about SQL joins at 14+ SQL beginner interview Q&As.

Q5 When will you use a self-join?
A5 You use a SELF-JOIN when a table references data in itself.

Example 1: an Employee table may have a manager_id column that points to the employee who is the boss of the current employee.

Example 2: Match customers that are from the same city and country.

Example 3: An order line item has one main or parent component & many child components.

It’s basically used where there is any relationship among rows stored in the same table.

a) Hierarchical relationships
b) Sequential relationships
c) Graph data

Q6 In What order the SELECT SQL statements are executed?
A6 Here is the the order in which the SQL statements are evaluated. Teradata is a popular data warehouse supporting additional statements like QUALIFY, OLAP, SAMPLE & EXPAND.

1) FROM & JOINS
2) WHERE
3) GROUP BY
4) HAVING ( filter by aggregates like sum(), count(), avg() etc)
5) OLAP – ( Teradata function provides individual rows in addition to the aggregates)
6) QUALIFY – ( Teradata analytical window function )
7) SELECT
8) DISTINCT
9) SAMPLE or EXPAND ON – Teradata specific.
10) ORDER BY
11) LIMIT / OFFSET

NOTE: “EXAPAND ON” is used for time series expansion on a PERIOD column value of an input row, and “SAMPLE” is used for generating samples as in 10 rows or 25% of the rows, or multiple samples as shown below:

In Teradata:

Q7 What is wrong with the below query?

A7 It is an incorrect use of the window function. You can fix the above query with a derived table as shown below.

Where the aliask” is a derived table.

Alternatively, if your server supports CTE (i.e. Common Table Expression).

If you are using Teradata then

QUALIFY filters on OLAP (i.e. analytical) functions similar to HAVING clause filters on aggregate functions.

Q8 When will you use a full-outer-join? Is there another alternative to get the same results without using a full-outer-join?
A8 The full-outer-join returns a result set that includes rows from both left and right tables. When no matching rows exist for the row in the left table, the columns of the right table will have nulls. Similarly, when no matching rows exist for the row in the right table, the column of the left table will have nulls.

Yes, the alternative is to do a UNION of a) an INNER JOIN, b) a LEFT JOIN (with right side IS NULL) and c) a RIGHT JOIN (with left side IS NULL). This approach is better as you can control each individual joins without having to use complex coalesce statements required with the full-outer-join.

Q9 How will you go about writing an SQL query for the following scenario?

database table structure

database table structure

Calculation table with the following columns PortfolioId, AccountId, Balance, InActiveFlag, CalculationDate, and CalcTypeCd. The Portfolio table has columns PortfolioId, and PortfolioCd.  The Account table has columns AccountId and AccountCd.

Write an SQL query to extract out the Accountcd and the corresponding Balance for a given Portfoliocd and CalculationDate. Please note that there will be multiple balance records for each account, and your query must only extract out a single balance record per account based on the rule ‘extract the record with minimum value for CalcTypeCd‘.

A9 As you can see in the sample answer below, inner joins are used to join with the relevant tables. A sub query is used to calculate the min(CalcTypeCd) to extract the record with minimum value for CalcTypeCd.

Q10 If you need to map actual values retrieved from the database to some other value and then sort by these translated values as well, how will you go about accomplishing this in your SQL code?

For example, StatusCd is the column in the Portfolio table,  and it can have the values of New, and Processed. But the SQL query should return a status of ‘Excluded’ if the ExcludedFlag column is set yes, and ‘Sent’ if the SentDateTime is not null. iIf none of the above conditions are met, then return the StatusCd as in the database. The sorting needs to be carried out in the order of ‘New‘, ‘Processed‘, ‘Sent‘, and then ‘Excluded‘.

A10 This can be achieved with case statement. The syntax of switch/case statement can vary among databases. Here is a sample SQL based on Sybase database server.

case-when|else-end statements

Q11 How would you retrieve a date time column converted to string and formatted as dd/mm/yy hh:mm:ss
A11 You can use specific functions provided by your database server. These functions are specific to the database server you are using, hence your code cannot be ported to other database servers. Here is an example in Sybase.


In the above example, the convert function is used to convert the date time field to char. The 103 in Sybase means dd/mm/yy format and and 108 to convert to the time format hh:mm:ss.

Q12 How will you go about tuning your SQL and stored procedures?
A12 You can use tools like DB Artisan, TOAD, etc to analyse the query plan. The code (in Sybase) below gives you the elapsed time.

Q13 How will you go about tuning your SQL and stored procedures?
A13 You can use tools like DB Artisan, TOAD, etc to analyse the query plan. The code below gives you the elapsed time.

Proper indexing is key to get good performancee out of your SQL queries.

Q14 What are all the different types of indexes?
A14 There are three types of indexes

Unique Index: does not allow the field to have duplicate values if the column is unique indexed. Unique index can be applied automatically when primary key is defined.

Clustered Index: reorders the physical order of the table and search based on the key values. Each table can have only one clustered index.

NonClustered Index: does not alter the physical order of the table and maintains logical order of data. Each table can have 999 non-clustered indexes.

Q15 How will you go about searching for table and column names that you don’t know where they really are? For example, search for a column name to find out in which tables they do exist.
A15 You need to query the database system tables.

For example, in Sybase, you can query it as shown below.

in Teradata,

Where TableKind, “T” is for Table, “V” is for View, ‘M’ is for Macro, ‘P’ is for Stored Procedure, and ‘G’ is for Trigger.

Data warehouse, [Big] Data Engineer & Data Analyst FAQs

50+ Big Data Interview FAQs.

SQL Tutorials & Q&As.

SQL Scenarios based interview Q&As for the analysts & developers.

250+ Data warehouse & Big Data Interview FAQs


Why & What are the benefits

🎯 Why java-success.com?

🎯 What are the benefits of Q&As approach?

Learn by categories such as FAQs – Core Java, Key Area – Low Latency, Core Java – Java 8, JEE – Microservices, Big Data – NoSQL, Architecture – Distributed, Big Data – Spark, etc. Some posts belong to multiple categories.

800+ Java & Big Data Q&As Menu

Top