15+ SQL scenarios based interview Q&As – part 1

Q01 How will you go about identifying duplicate records in a table?
A01 This is a very popular interview question because there are many approaches, hence when you provide a solution the interviewer can drill further to assess your understanding on uses of DISTINCT, GROUP BY, WINDOW function, Common Table Expression (i.e. CTEs), etc.

Setup the data on db-fiddle to practice.

Show me the data:

OUTPUT:

GROUP BY

OUTPUT: Shows records that have more than 1 occurrence.

Note: Interviewer is testing if you understand that the aggregate queries need to have a “GROUP BY” clause on columns that are NOT aggregated. E.g. NAME, DESIGNATION and DOB.

Q02 How would you go about deleting the duplicate records?
A02 You could do it in a number of steps as shown below.

  • Create a temporary table.
  • Insert the unique (i.e.DISTINCT) records into the temporary table.
  • Drop the original table.
  • Rename the temporary table to original table name.
  • Alter the table to add an identity column.

DISTINCT keyword

Example based on MySQL 8.0.

OUTPUT:

Window function

Example based on Teradata. ROW_NUMBER() is a window function that assigns a sequential integer to each row within the PARTITION BY of a result set. In the below example, the rows are partitioned by name, designation, and dob. QUALIFY is like a WHERE clause for windowed records, and will filter out any duplicated records. Learn more in detail at SQL window function for analytics & qualify interview Q&As.

OUTPUT:

Example based on Teradata, which has the QUALIFY

Window function & CTE

With databases that support CTE (i.e. Common Table Expression). Learn more in detail at Common Table Expressions (i.e. CTE) in SQL using the “WITH” clause.

OUTPUT:

GROUP BY & Subquery with max(..)

Using the GROUP BY & MAX() without using CTE and ROW_NUMBER().

DT is an alias for a Derived Table. The alias can be anything other than DT. It could be x. Use derived tables when you are required to perform some complex query and result of inner query work as table for outer query. Derived tables are created on the fly with the help of a Select statement. The scope of derived table is limited to the outer Select query who created it. Derived table can’t be used further outside the scope of outer select query.

So, you have CTEs, derived tables, subqueries & temp tables to be used in different scenarios at your disposal.

OUTPUT:

SQL practice on DB-FIDDLE

SQL practice on DB-FIDDLE

Q03 Often you will have a requirement to report on a certain dimension, which requires you to convert rows into columns or columns into rows. How will you go about accomplishing this using SQL?
A03 The process of converting rows into columns is know as “PIVOTing”. The reverse process of converting the columns into rows is known as “UNPIVOTing”.

Some databases provide PIVOT & UNPIVOT functions, but let’s use SQL to achieve this. Let’s use DB Fiddle to practice SQL using MySQL V8.0 as depicted below.

PIVOT

PIVOTing is also known as transposing rows to columns.

DB Fiddle - Practice SQL

DB Fiddle – Practice SQL

Create a Table:

INSERT DATA:

SELECT DATA:

OUTPUT:

DB Fiddle - SELECT * FROM EMPLOYEES

DB Fiddle – SELECT * FROM EMPLOYEES

PIVOT

The CASE statements & aggregate function SUM with GROUP BY to the rescue as shown below.

OUTPUT:

DB Fiddle - SQL PIVOT

DB Fiddle – SQL PIVOT

CTAS – Create Table AS

Let’s create a PIVOTED table so that we can UNPIVOT in the next step. CTAS is a parallel operation that creates a new table based on the output of a SELECT statement.

CTAS is a more customizable version of the SELECT…INTO statement. For example:

CTAS is more powerful as you can specify both the distribution of the table data as well as the table structure type.

Amazon Athena federated query example:

UNPIVOT

The UNION ALL statementsto the rescue.

Output:

DB FIDDLE - UNPIVOT SQL

DB FIDDLE – UNPIVOT SQL

Use of PIVOT function

A typical example would be to display the sales data for Quarter 1 to 4. The below is a Teradata PIVOT function.

Sales by quarter

Sales by quarter

CREATE a VOLATILE TABLE

The above table can be created for demo purpose using a VOLATILE TABLE, which is an in memory table in teradata:

INSERT sample rows
Use the PIVOT function

Output:

PIVOTED SALES BY QUARTER

PIVOTED SALES BY QUARTER

Power of CASE statements

Below question is very popular in SQL interviews:

Q: Given the below table with win versus loss between 2 teams, how will you create a points table with played, won & lost?

A: Let’s create the data first.

CASE statement, union & group by

CASE statement helps you evaluate the win in numeric term as in 1 for win & 0 for a loss. Union all is required to get both team_1 & team_2 names. Aggregation as in sum(..) & count() with group by is required to display the final results.

Output:

Q: What if you have a draw shown as a null in the “who_won” column?

A: We can add another CASE statement to represent a draw.

Output:

Bonus SQL question

Q: What is a null safe join?
A: A join displays only the rows that have a match in both joined tables. Columns containing NULL do not match any values when you are creating an inner join and are therefore excluded from the result set. Null values do not match other null values.

One option is to make your join column as NOT NULL and set some default value so that it won’t have null values. Another option is to create a new column that will act as a surrogate key to join on instead.

What if the tables are already created & you can only make changes in the queries?

One common approach is to convert the NULLs to some other non-NULL value using a function like COALESCE or ISNULL

The above scalar functions can impact performance. Another approach is to have a null check as shown below:

Finally, some database servers provide a NULL safe equality operator like <=>. For example, in Apache Spark

OR

More SQL scenarios interview Q&As

Questions Q4 to Q16 15+ SQL scenarios based interview questions answered – Part 2 covering scenarios for joins, SQL recursion and more.

Leran more SQL

Pivoting a Spark Dataframe

80+ SQL interview Q&As.


300+ Java & Big Data Interview FAQs

16+ Java Key Areas Interview Q&As

800+ Java Interview Q&As

300+ Java & Big Data Tutorials