50+ SQL interview questions and answers to solve real business scenarios. SQL is widely used in building microservices & Big Data projects. Learning SQL syntax is easy, but being able to convert a given business requirement into a query takes lots of practice. These scenarios based interview questions can assesses your experience.
Considerations & Tips
1) Break down a complex business requirement into sub sections.
2) Think about the grain of the data. For example, account level, user level, department level, etc. What to GROUP BY, and what columns to include in the GROUP BY for the resulting query.
3) What joins (e.g. inner, left outer, right outer, self, full outer, anti, etc) are required & at what granularity. Do these joins produce one-to-one, one-to-many or many-to-many results.
4) Is ordering of the rows required? Ranking functions may need the row ordering.
5) Do I need to aggregate any values?
6) Should I display aggregated values at a group level or for each row using WINDOWing.
7) Should I transpose rows to columns (i.e pivot)? Should I convert columns to rows(i.e. unpivot)?
8) Any subqueries or CTE required?
9) Any ranking required over a partition (i.e Window) for each record?
10)Do I need to compare any columns between records over a partition (i.e Window) with lead() or lag() functions? E.g. Compare current record with previous or next record.
Let’s cover lots of scenarios based SQL in this series to get a good handle on SQL by practicing & experiencing.
Q01 How will you go about identifying duplicate records in a table?
A01 This is a very popular interview question because there are many approaches, hence when you provide a solution the interviewer can drill further to assess your understanding of DISTINCT, GROUP BY, WINDOW function, Common Table Expression (i.e. CTEs), etc.
Q. What does duplicate mean?
A. Either one or more columns that are supposed to be unique to identify a row are repeated in multiple rows or all columns are repeated in multiple rows. Multiple means more than 1 row. Often tables will have an id column as a primary key uniquely generated for each row, but the remaining columns may have duplicate values in 2 or more rows as shown in the below sample data “employees” table, where the “id” column is unique, but the the combination of the remaining columns are repeated for rows with “name” as “PETER” and “JOHN”.
Setup the data on db-fiddle MySQL V8.0 to practice.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE IF NOT EXISTS employees (id INT NOT NULL AUTO_INCREMENT, name VARCHAR(256), designation VARCHAR(256), dob DATE, PRIMARY KEY (id) ); INSERT INTO employees(name, designation, dob ) VALUES ('JOHN', 'Engineer', '1996-01-18'); INSERT INTO employees(name, designation, dob ) VALUES ('PETER', 'VP', '1989-05-28'); INSERT INTO employees(name, designation, dob ) VALUES ('JESSICA', 'CIO', '1979-05-28'); INSERT INTO employees(name, designation, dob ) VALUES ('JOHN', 'Engineer', '1996-01-18'); INSERT INTO employees(name, designation, dob ) VALUES ('PETER', 'VP', '1989-05-28'); |
Show me the data:
1 |
SELECT * FROM EMPLOYEES; |
OUTPUT:
1 2 3 4 5 6 7 8 9 |
id name designation dob ---------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 4 JOHN Engineer 1996-01-18 5 PETER VP 1989-05-28 |
SQL – GROUP BY
The HAVING clause filters on the GROUP BY results.
1 2 3 4 5 6 7 8 9 |
SELECT name , designation , dob, COUNT(*) AS number_of_occurrences FROM employees GROUP BY name, designation, dob HAVING ( COUNT(*) > 1 ); |
OUTPUT:
Shows records that have more than 1 occurrence.
1 2 3 4 5 6 |
name designation dob number_of_occurrences ---------------------------------------------------------------- JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 2 |
Note: Interviewer is testing if you understand that the aggregate queries need to have a “GROUP BY” clause on columns that are NOT aggregated. E.g. name, designation and dob.
Q02 How would you go about deleting the duplicate records?
A02 You could do it in a number of steps as shown below.
- Create a temporary table.
- Insert the unique (i.e.DISTINCT) records into the temporary table.
- Drop the original table.
- Rename the temporary table to original table name.
- Alter the table to add an identity column.
SQL – DISTINCT keyword
Example based on MySQL 8.0.
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE employees_temp AS SELECT DISTINCT name, designation, dob FROM employees; DROP TABLE employees; RENAME TABLE employees_temp TO employees; ALTER TABLE `employees` ADD COLUMN `id` INT AUTO_INCREMENT UNIQUE FIRST; SELECT * FROM employees; |
OUTPUT:
1 2 3 4 5 6 7 |
id name designation dob ---------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 |
SQL – Window function
ROW_NUMBER() is a window function that assigns a sequential integer to each row within the PARTITION BY of a result set. In the below example, the rows are partitioned by name, designation, and dob. Learn more in detail at SQL window function for analytics & qualify interview Q&As.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
CREATE TABLE employees_temp AS ( SELECT id, name, designation, dob FROM ( SELECT * , ROW_NUMBER() OVER (PARTITION BY name, designation, dob order by id) AS RNK FROM employees ) dt WHERE dt.RNK = 1 ); DELETE FROM employees; INSERT INTO employees SELECT * FROM employees_temp; DROP TABLE employees_temp; SELECT * FROM employees; |
OUTPUT:
1 2 3 4 5 6 7 |
id name designation dob ---------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 |
Example based on Teradata, which has the QUALIFY
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO employees_temp SELECT * FROM employees QUALIFY ROW_NUMBER() over (PARTITION BY name, designation, dob order by id desc) = 1; DELETE FROM employees; INSERT INTO employees SELECT * FROM employees_temp; DROP TABLE employees_temp; |
SQL Window function & CTE (i.e. Common Table Expression)
With databases that support CTE (i.e. Common Table Expression). Learn more in detail at Common Table Expressions (i.e. CTE) in SQL using the “WITH” clause.
1 2 3 4 5 6 7 8 9 10 11 12 |
WITH cte_employees AS ( SELECT id, name, designation, dob, ROW_NUMBER() OVER (PARTITION BY name, designation, dob ORDER BY id ASC) AS rnk FROM employees ) DELETE FROM employees WHERE id IN ( SELECT id FROM cte_employees WHERE rnk <> 1); SELECT * FROM employees; |
OUTPUT:
1 2 3 4 5 6 7 |
id name designation dob ------------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 |
SQL – GROUP BY & Subquery with max(..)
Using the GROUP BY & MAX() without using CTE and ROW_NUMBER().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
DELETE FROM employees WHERE id NOT IN ( SELECT max_id FROM ( SELECT MAX(id) AS max_id FROM employees GROUP BY name, designation, dob ) dt ) ; SELECT * FROM employees; |
“dt” is an alias for a Derived Table. The alias can be anything other than DT. It could be x. Use derived tables when you are required to perform some complex query and result of inner query work as table for outer query. Derived tables are created on the fly with the help of a Select statement. The scope of derived table is limited to the outer Select query who created it. Derived table can’t be used further outside the scope of outer select query.
So, you have CTEs, derived tables, subqueries & temp tables to be used in different scenarios at your disposal.
OUTPUT:
1 2 3 4 5 6 7 |
id name designation dob --------------------------------------------- 3 JESSICA CIO 1979-05-28 4 JOHN Engineer 1996-01-18 5 PETER VP 1989-05-28 |
Use MySQL V8.0 so that CTE (i.e. Common Table Expression) is supported.
SQL – SELF JOIN
One of the SQL join types is to join a table to itself. You will see more examples of this in the other scenarios based Q&As. This approach is not viable if you have too many columns as you have to include all columns in the join conditions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
WITH cte_duplicate_ids AS ( SELECT e1.id AS id FROM employees e1 INNER JOIN employees e2 WHERE e1.id < e2.id AND e1.name = e2.name AND e1.designation = e2.designation AND e1.dob = e2.dob ) DELETE FROM employees WHERE id IN (SELECT id FROM cte_duplicate_ids) ; SELECT * FROM employees; |
OUTPUTS:
1 2 3 4 5 6 7 |
id name designation dob --------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 |
Note that NON EQUI JOIN e1.id < e2.id is a must NOT to join on rows with same id or the ones already joined again. Try without the non equi join & see what happens. Instead of “<” try “<>” and see what happens.
What if all columns are repeated?
It is important to note that if all the columns in a row gets repeated including the “id” column then some of the above approaches will NOT work.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE IF NOT EXISTS employees (id INT NOT NULL, name VARCHAR(256), designation VARCHAR(256), dob DATE ); INSERT INTO employees(id, name, designation, dob ) VALUES (1, 'JOHN', 'Engineer', '1996-01-18'); INSERT INTO employees(id, name, designation, dob ) VALUES (2, 'PETER', 'VP', '1989-05-28'); INSERT INTO employees(id, name, designation, dob ) VALUES (3, 'JESSICA', 'CIO', '1979-05-28'); INSERT INTO employees(id, name, designation, dob ) VALUES (1, 'JOHN', 'Engineer', '1996-01-18'); INSERT INTO employees(id, name, designation, dob ) VALUES (2, 'PETER', 'VP', '1989-05-28'); |
1 2 3 |
SELECT * FROM employees |
Outputs:
1 2 3 4 5 6 7 8 9 |
id name designation dob --------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 |
Delete duplicate records:
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE employees_temp AS SELECT DISTINCT * FROM employees; DROP TABLE employees; RENAME TABLE employees_temp TO employees; SELECT * FROM employees; |
Outputs:
1 2 3 4 5 6 7 |
id name designation dob --------------------------------------------- 1 JOHN Engineer 1996-01-18 2 PETER VP 1989-05-28 3 JESSICA CIO 1979-05-28 |
More SQL scenarios interview Q&As
100+ SQL interview questions & answers with scenarios & examples.