15+ SQL interview questions and answers to solve real business scenarios. SQL is widely used in building microservices & Big Data projects. Learning SQL syntax is easy, but being able to convert a given business requirement into a query takes lots of practice. These scenarios based interview questions can assesses your experience.
Q01 How will you go about identifying duplicate records in a table?
A01 This is a very popular interview question because there are many approaches, hence when you provide a solution the interviewer can drill further to assess your understanding of DISTINCT, GROUP BY, WINDOW function, Common Table Expression (i.e. CTEs), etc.
Setup the data on db-fiddle to practice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE IF NOT EXISTS EMPLOYEES (ID INT NOT NULL AUTO_INCREMENT, NAME VARCHAR(256), DESIGNATION VARCHAR(256), DOB DATE, PRIMARY KEY (id) ); INSERT INTO EMPLOYEES(NAME, DESIGNATION, DOB ) VALUES ('JOHN', 'Engineer', '2009-01-18'); INSERT INTO EMPLOYEES(NAME, DESIGNATION, DOB ) VALUES ('PETER', 'VP', '1989-05-28'); INSERT INTO EMPLOYEES(NAME, DESIGNATION, DOB ) VALUES ('JESSICA', 'CIO', '1979-05-28'); INSERT INTO EMPLOYEES(NAME, DESIGNATION, DOB ) VALUES ('JOHN', 'Engineer', '2009-01-18'); INSERT INTO EMPLOYEES(NAME, DESIGNATION, DOB ) VALUES ('PETER', 'VP', '1989-05-28'); |
Show me the data:
1 |
SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 8 9 |
| ID | NAME | DESIGNATION | DOB | | --- | ------- | ----------- | ---------- | | 1 | JOHN | Engineer | 2009-01-18 | | 2 | PETER | VP | 1989-05-28 | | 3 | JESSICA | CIO | 1979-05-28 | | 4 | JOHN | Engineer | 2009-01-18 | | 5 | PETER | VP | 1989-05-28 | |
SQL – GROUP BY
1 2 3 4 |
SELECT NAME, DESIGNATION, DOB, COUNT(*) AS NumOccurrences FROM EMPLOYEES GROUP BY NAME, DESIGNATION, DOB HAVING ( COUNT(*) > 1 ); |
OUTPUT: Shows records that have more than 1 occurrence.
1 2 3 4 5 6 7 8 |
| NAME | DESIGNATION | DOB | NumOccurrences | | ----- | ----------- | ---------- | -------------- | | JOHN | Engineer | 2009-01-18 | 2 | | PETER | VP | 1989-05-28 | 2 | |
Note: Interviewer is testing if you understand that the aggregate queries need to have a “GROUP BY” clause on columns that are NOT aggregated. E.g. NAME, DESIGNATION and DOB.
Q02 How would you go about deleting the duplicate records?
A02 You could do it in a number of steps as shown below.
- Create a temporary table.
- Insert the unique (i.e.DISTINCT) records into the temporary table.
- Drop the original table.
- Rename the temporary table to original table name.
- Alter the table to add an identity column.
SQL – DISTINCT keyword
Example based on MySQL 8.0.
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE EMPLOYEES_TEMP AS SELECT DISTINCT NAME, DESIGNATION, DOB FROM EMPLOYEES; DROP TABLE EMPLOYEES; RENAME TABLE EMPLOYEES_TEMP TO EMPLOYEES; ALTER TABLE `EMPLOYEES` ADD COLUMN `id` INT AUTO_INCREMENT UNIQUE FIRST; SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 |
| id | NAME | DESIGNATION | DOB | | --- | ------- | ----------- | ---------- | | 1 | JOHN | Engineer | 2009-01-18 | | 2 | PETER | VP | 1989-05-28 | | 3 | JESSICA | CIO | 1979-05-28 | |
SQL – Window function
Example based on Teradata. ROW_NUMBER() is a window function that assigns a sequential integer to each row within the PARTITION BY of a result set. In the below example, the rows are partitioned by name, designation, and dob. QUALIFY is like a WHERE clause for windowed records, and will filter out any duplicated records. Learn more in detail at SQL window function for analytics & qualify interview Q&As.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE EMPLOYEES_TEMP AS ( SELECT ID, NAME, DESIGNATION, DOB FROM ( SELECT * , ROW_NUMBER() OVER (PARTITION BY NAME, DESIGNATION, DOB order by ID) AS RNK FROM EMPLOYEES ) DT WHERE DT.RNK = 1 ); DELETE FROM EMPLOYEES; INSERT INTO EMPLOYEES SELECT * FROM EMPLOYEES_TEMP; DROP TABLE EMPLOYEES_TEMP; SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 |
| ID | NAME | DESIGNATION | DOB | | --- | ------- | ----------- | ---------- | | 1 | JOHN | Engineer | 2009-01-18 | | 2 | PETER | VP | 1989-05-28 | | 3 | JESSICA | CIO | 1979-05-28 | |
Example based on Teradata, which has the QUALIFY
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO employee_temp SELECT * FROM employee QUALIFY ROW_NUMBER() over (PARTITION BY name, designation, dob order by id desc) = 1; DELETE FROM employee; INSERT INTO employee SELECT * FROM employee_temp; DROP TABLE employee_temp; |
SQL Window function & CTE (i.e. Common Table Expression)
With databases that support CTE (i.e. Common Table Expression). Learn more in detail at Common Table Expressions (i.e. CTE) in SQL using the “WITH” clause.
1 2 3 4 5 6 7 8 9 10 11 |
WITH CTE_EMPLOYEES AS ( SELECT ID, NAME, DESIGNATION, DOB, ROW_NUMBER() OVER (PARTITION BY name, designation, dob ORDER BY id ASC) AS RNK FROM EMPLOYEES ) DELETE FROM EMPLOYEES WHERE ID IN ( SELECT ID FROM CTE_EMPLOYEES WHERE RNK <> 1); SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 |
| ID | NAME | DESIGNATION | DOB | | --- | ------- | ----------- | ---------- | | 1 | JOHN | Engineer | 2009-01-18 | | 2 | PETER | VP | 1989-05-28 | | 3 | JESSICA | CIO | 1979-05-28 | |
SQL – GROUP BY & Subquery with max(..)
Using the GROUP BY & MAX() without using CTE and ROW_NUMBER().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DELETE FROM EMPLOYEES WHERE ID NOT IN ( SELECT ID FROM ( SELECT MAX(ID) AS ID FROM EMPLOYEES GROUP BY NAME, DESIGNATION, DOB ) DT ) ; SELECT * FROM EMPLOYEES; |
DT is an alias for a Derived Table. The alias can be anything other than DT. It could be x. Use derived tables when you are required to perform some complex query and result of inner query work as table for outer query. Derived tables are created on the fly with the help of a Select statement. The scope of derived table is limited to the outer Select query who created it. Derived table can’t be used further outside the scope of outer select query.
So, you have CTEs, derived tables, subqueries & temp tables to be used in different scenarios at your disposal.
OUTPUT:
1 2 3 4 5 6 7 |
| ID | NAME | DESIGNATION | DOB | | --- | ------- | ----------- | ---------- | | 3 | JESSICA | CIO | 1979-05-28 | | 4 | JOHN | Engineer | 2009-01-18 | | 5 | PETER | VP | 1989-05-28 | |
Q03 Often you will have a requirement to report on a certain dimension, which requires you to convert rows into columns or columns into rows. How will you go about accomplishing this using SQL?
A03 The process of converting rows into columns is know as “PIVOTing”. The reverse process of converting the columns into rows is known as “UNPIVOTing”.
Some databases provide PIVOT & UNPIVOT functions, but let’s use SQL to achieve this. Let’s use DB Fiddle to practice SQL using MySQL V8.0 as depicted below.
SQL – PIVOT
PIVOTing is also known as transposing rows to columns.
Create a Table:
1 2 3 4 5 6 7 |
CREATE TABLE IF NOT EXISTS EMPLOYEES (CODE INT NOT NULL, NAME VARCHAR(256), SALARY_TYPE VARCHAR(256), VALUE DECIMAL(18,2) ); |
INSERT DATA:
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (10, 'JOHN', 'BASE', 30000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (10, 'JOHN', 'BONUS', 5000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (10, 'JOHN', 'HIKE_PERCENTAGE', 8); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (20, 'Sam', 'BASE', 40000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (20, 'Sam', 'BONUS', 5000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (20, 'Sam', 'HIKE_PERCENTAGE', 6); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (30, 'Peter', 'BASE', 50000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (30, 'Peter', 'BONUS', 3000); INSERT INTO EMPLOYEES(CODE, NAME, SALARY_TYPE, VALUE ) VALUES (30, 'Peter', 'HIKE_PERCENTAGE', 4); |
SELECT DATA:
1 |
SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
| CODE | NAME | SALARY_TYPE | VALUE | | ---- | ----- | --------------- | -------- | | 10 | JOHN | BASE | 30000.00 | | 10 | JOHN | BONUS | 5000.00 | | 10 | JOHN | HIKE_PERCENTAGE | 8.00 | | 20 | Sam | BASE | 40000.00 | | 20 | Sam | BONUS | 5000.00 | | 20 | Sam | HIKE_PERCENTAGE | 6.00 | | 30 | Peter | BASE | 50000.00 | | 30 | Peter | BONUS | 3000.00 | | 30 | Peter | HIKE_PERCENTAGE | 4.00 | |
PIVOT
The CASE statements & aggregate function SUM with GROUP BY to the rescue as shown below.
1 2 3 4 5 6 7 |
SELECT CODE, NAME, SUM(CASE WHEN SALARY_TYPE = 'BASE' THEN VALUE END) AS 'BASE', SUM(CASE WHEN SALARY_TYPE = 'BONUS' THEN VALUE END) AS 'BONUS', SUM(CASE WHEN SALARY_TYPE = 'HIKE_PERCENTAGE' THEN VALUE END) AS'HIKE_PERCENTAGE' FROM EMPLOYEES GROUP BY CODE, NAME |
OUTPUT:
1 2 3 4 5 6 7 |
| CODE | NAME | BASE | BONUS | HIKE_PERCENTAGE | | ---- | ----- | -------- | ------- | --------------- | | 10 | JOHN | 30000.00 | 5000.00 | 8.00 | | 20 | Sam | 40000.00 | 5000.00 | 6.00 | | 30 | Peter | 50000.00 | 3000.00 | 4.00 | |
SQL CTAS (Create Table AS)
Let’s create a PIVOTED table so that we can UNPIVOT in the next step. CTAS is a parallel operation that creates a new table based on the output of a SELECT statement.
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE EMPLOYEES_PIVOT AS ( SELECT CODE, NAME, SUM(CASE WHEN SALARY_TYPE = 'BASE' THEN VALUE END) AS 'BASE', SUM(CASE WHEN SALARY_TYPE = 'BONUS' THEN VALUE END) AS 'BONUS', SUM(CASE WHEN SALARY_TYPE = 'HIKE_PERCENTAGE' THEN VALUE END) AS'HIKE_PERCENTAGE' FROM EMPLOYEES GROUP BY CODE, NAME ); |
CTAS is a more customizable version of the SELECT…INTO statement. For example:
1 2 3 4 |
SELECT * INTO newtable [IN externaldb] FROM oldtable WHERE condition; |
CTAS is more powerful as you can specify both the distribution of the table data as well as the table structure type.
Amazon Athena federated query example:
1 2 3 4 5 6 7 8 |
CREATE TABLE [MyDB].[newtable] WITH (format='PARQUET', parquet_compression='SNAPPY', partitioned_by=array['year'], external_location = 's3://your-bucket/optimized-data/') AS SELECT * FROM [MYDB].[oldtable]; |
SQL – UNPIVOT
The UNION ALL statementsto the rescue.
1 2 3 4 5 6 7 8 9 |
SELECT * FROM ( SELECT CODE, NAME, 'BASE' AS SALARY_TYPE, BASE FROM EMPLOYEES_PIVOT UNION ALL SELECT CODE, NAME, 'BONUS' AS SALARY_TYPE, BONUS FROM EMPLOYEES_PIVOT UNION ALL SELECT CODE, NAME, 'HIKE_PERCENTAGE' AS SALARY_TYPE, HIKE_PERCENTAGE FROM EMPLOYEES_PIVOT ) DT ORDER BY CODE; |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
| CODE | NAME | SALARY_TYPE | BASE | | ---- | ----- | --------------- | -------- | | 10 | JOHN | BASE | 30000.00 | | 10 | JOHN | BONUS | 5000.00 | | 10 | JOHN | HIKE_PERCENTAGE | 8.00 | | 20 | Sam | BASE | 40000.00 | | 20 | Sam | BONUS | 5000.00 | | 20 | Sam | HIKE_PERCENTAGE | 6.00 | | 30 | Peter | HIKE_PERCENTAGE | 4.00 | | 30 | Peter | BASE | 50000.00 | | 30 | Peter | BONUS | 3000.00 | |
SQL – Use of PIVOT function
A typical example would be to display the sales data for Quarter 1 to 4. The below is a Teradata PIVOT function.
CREATE a VOLATILE TABLE
The above table can be created for demo purpose using a VOLATILE TABLE, which is an in memory table in teradata:
1 2 3 4 5 6 7 8 9 |
CREATE VOLATILE TABLE tbl_sales ( product varchar(100), sale_year int, quarter varchar(20), sale_amount decimal(18,2), cost_of_goods_sold decimal(18,2) ) on commit preserve rows; |
INSERT sample rows
1 2 3 4 5 6 7 8 9 10 11 12 13 |
INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2019, 'Q1', 35000.00, 12000.00); INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2018, 'Q4', 32000.00, 14000.00); INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2019, 'Q1', 38000.00, 12000.00); INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2019, 'Q2', 31000.00, 12000.00); INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2019, 'Q3', 45000.00, 23000.00); INSERT INTO tbl_sales VALUES ('Toshiba Laptop', 2019, 'Q4', 14000.00, 6000.00); INSERT INTO tbl_sales VALUES ('iPhone', 2018, 'Q4', 180000.00, 75000.00); INSERT INTO tbl_sales VALUES ('iPhone', 2019, 'Q1', 170000.00, 67000.00); INSERT INTO tbl_sales VALUES ('iPhone', 2019, 'Q2', 360000.00, 40000.00); INSERT INTO tbl_sales VALUES ('iPhone', 2019, 'Q3', 456000.00, 123000.00); INSERT INTO tbl_sales VALUES ('iPhone', 2019, 'Q4', 770000.00, 23500.00); |
Use the PIVOT function
1 2 3 4 5 6 7 8 9 10 11 |
SELECT * FROM tbl_sales PIVOT ( SUM(sale_amount) as gross_add, SUM(cost_of_goods_sold) as cogs FOR quarter IN ('Q1' AS Quarter1, 'Q2' AS Quarter2, 'Q3' AS Quarter3, 'Q4' AS Quarter4) )tmp; |
Output:
SQL – Power of CASE statements
Below question is very popular in SQL interviews:
Q: Given the below table with win versus loss between 2 teams, how will you create a points table with played, won & lost?
1 2 3 4 5 6 7 8 |
| team_1 | team_2 | who_won | | ------ | ------ | ------- | | A | B | B | | B | C | B | | A | D | D | | A | C | A | |
A: Let’s create the data first.
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE IF NOT EXISTS TEAM_TALLY (team_1 VARCHAR(256), team_2 VARCHAR(256), who_won VARCHAR(256) ); INSERT INTO TEAM_TALLY(team_1, team_2, who_won ) VALUES ('A', 'B', 'B'); INSERT INTO TEAM_TALLY(team_1, team_2, who_won ) VALUES ('B', 'C', 'B'); INSERT INTO TEAM_TALLY(team_1, team_2, who_won ) VALUES ('A', 'D', 'D'); INSERT INTO TEAM_TALLY(team_1, team_2, who_won ) VALUES ('A', 'C', 'A'); |
CASE statement, union & group by
CASE statement helps you evaluate the win in numeric term as in 1 for win & 0 for a loss. Union all is required to get both team_1 & team_2 names. Aggregation as in sum(..) & count() with group by is required to display the final results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
SELECT team, count(1) as played, sum(result) as won, count(1)-sum(result) as lost FROM ( SELECT team_1 as team, CASE WHEN team_1 = who_won THEN 1 ELSE 0 END AS result FROM TEAM_TALLY UNION ALL SELECT team_2 as team, CASE WHEN team_2 = who_won THEN 1 ELSE 0 END as result FROM TEAM_TALLY ) DT GROUP BY team ; |
Output:
1 2 3 4 5 6 7 8 |
| team | played | won | lost | | ---- | ------ | --- | ---- | | A | 3 | 1 | 2 | | B | 2 | 2 | 0 | | C | 2 | 0 | 2 | | D | 1 | 1 | 0 | |
Q: What if you have a draw shown as a null in the “who_won” column?
1 2 3 4 5 6 7 8 |
| team_1 | team_2 | who_won | | ------ | ------ | ------- | | A | B | | | B | C | B | | A | D | D | | A | C | A | |
A: We can add another CASE statement to represent a draw.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SELECT team, count(1) as played, sum(result) as won, count(1)-sum(result) - sum(drawn) as lost, sum(drawn) as draw FROM ( SELECT team_1 as team, CASE WHEN (who_won IS NOT NULL AND team_1 = who_won) THEN 1 ELSE 0 END AS result, CASE WHEN who_won IS NULL THEN 1 ELSE 0 END AS drawn FROM TEAM_TALLY UNION ALL SELECT team_2 as team, CASE WHEN (who_won IS NOT NULL AND team_2 = who_won) THEN 1 ELSE 0 END AS result, CASE WHEN who_won IS NULL THEN 1 ELSE 0 END AS drawn FROM TEAM_TALLY ) DT GROUP BY team ; |
Output:
1 2 3 4 5 6 7 8 9 |
| team | played | won | lost | draw | | ---- | ------ | --- | ---- | ---- | | A | 3 | 1 | 1 | 1 | | B | 2 | 1 | 0 | 1 | | C | 2 | 0 | 2 | 0 | | D | 1 | 1 | 0 | 0 | ; |
Bonus SQL question
Q: What is a null safe join?
A: A join displays only the rows that have a match in both joined tables. Columns containing NULL do not match any values when you are creating an inner join and are therefore excluded from the result set. Null values do not match other null values.
One option is to make your join column as NOT NULL and set some default value so that it won’t have null values. Another option is to create a new column that will act as a surrogate key to join on instead.
What if the tables are already created & you can only make changes in the queries?
One common approach is to convert the NULLs to some other non-NULL value using a function like COALESCE or ISNULL
1 2 3 |
ISNULL(a.AccountType,'X') = ISNULL(at.AccountType,'X') |
1 2 3 |
COALESCE(a.AccountType,'X') = COALESCE(at.AccountType,'X') |
The above scalar functions can impact performance. Another approach is to have a null check as shown below:
1 2 3 |
(a.AccountType = b.AccountType OR (a.AccountType IS NULL AND b.AccountType IS NULL)) |
Finally, some database servers provide a NULL safe equality operator like <=>. For example, in Apache Spark
1 2 3 |
orderDf.join(customerDf, orderDf("cust_type") <=> customerDf("cust_type")) |
OR
1 2 3 |
orderDf.join(customerDf, orderDf.cust_type.eqNullSafe(customerDf.cust_type) |
More SQL scenarios interview Q&As
Questions Q4 to Q16 15+ SQL scenarios based interview questions answered – Part 2 covering scenarios for joins, SQL recursion and more.
Leran more SQL
2) 80+ SQL interview questions & answers.
3) 115+ Beginner SQL interview questions & answers.