01: 50+ SQL scenarios based interview Q&As on identifying & deleting duplicate records

50+ SQL interview questions and answers to solve real business scenarios. SQL is widely used in building microservices & Big Data projects. Learning SQL syntax is easy, but being able to convert a given business requirement into a query takes lots of practice. These scenarios based interview questions can assesses your experience.

Considerations & Tips

It is important to first understand the problem statement, and then ask the right questions to solve the problem.

1) Break down a complex business requirement into sub sections.

2) Think about the grain of the data. For example, account level, user level, department level, etc. What to GROUP BY, and what columns to include in the GROUP BY for the resulting query.

3) What joins (e.g. inner, left outer, right outer, self, full outer, anti, etc) are required & at what granularity. Do these joins produce one-to-one, one-to-many or many-to-many results.

4) Is ordering of the rows required? Ranking functions may need the row ordering.

5) Do I need to aggregate any values. E.g sum(price), max(date_time), min(date_time), etc.

6) Should I display aggregated values at a group level or for each row using WINDOWing.

7) Should I transpose rows to columns (i.e pivot)? Should I convert columns to rows(i.e. unpivot)?

8) Any subqueries or CTE required?

9) Any ranking required over a partition (i.e Window) for each record?

10) Do I need to compare any columns between records over a partition (i.e Window) with lead() or lag() functions? E.g. Compare current record with previous or next record.

Let’s cover lots of scenarios based SQL in this series to get a good handle on SQL by practicing & experiencing.

Q01 How will you go about identifying duplicate records in a table?
A01 This is a very popular interview question because there are many approaches, hence when you provide a solution the interviewer can drill further to assess your understanding of DISTINCT, GROUP BY, WINDOW function, Common Table Expression (i.e. CTEs), etc.

Q. What does duplicate mean?
A. Either one or more columns that are supposed to be unique to identify a row are repeated in multiple rows or all columns are repeated in multiple rows. Multiple means more than 1 row. Often tables will have an id column as a primary key uniquely generated for each row, but the remaining columns may have duplicate values in 2 or more rows as shown in the below sample data “employees” table, where the “id” column is unique, but the the combination of the remaining columns are repeated for rows with “name” as “PETER” and “JOHN”.

Setup the data on db-fiddle MySQL V8.0 to practice.

DB-FIDDLE to practice SQL scenarios using MySQL V8.0

DB-FIDDLE to practice SQL scenarios using MySQL V8.0

Show me the data:

OUTPUT:

SQL – GROUP BY

The HAVING clause filters on the GROUP BY results.

OUTPUT:

Shows records that have more than 1 occurrence.

Note: Interviewer is testing if you understand that the aggregate queries need to have a “GROUP BY” clause on columns that are NOT aggregated. E.g. name, designation and dob.

Q02 How would you go about deleting the duplicate records?
A02 You could do it in a number of steps as shown below.

  • Create a temporary table.
  • Insert the unique (i.e.DISTINCT) records into the temporary table.
  • Drop the original table.
  • Rename the temporary table to original table name.
  • Alter the table to add an identity column.

SQL – DISTINCT keyword

Example based on MySQL 8.0.

OUTPUT:

SQL – Window function

ROW_NUMBER() is a window function that assigns a sequential integer to each row within the PARTITION BY of a result set. In the below example, the rows are partitioned by name, designation, and dob. Learn more in detail at SQL window function for analytics & qualify interview Q&As.

OUTPUT:

Example based on Teradata, which has the QUALIFY

SQL Window function & CTE (i.e. Common Table Expression)

With databases that support CTE (i.e. Common Table Expression). Learn more in detail at Common Table Expressions (i.e. CTE) in SQL using the “WITH” clause.

OUTPUT:

SQL – GROUP BY & Subquery with max(..)

Using the GROUP BY & MAX() without using CTE and ROW_NUMBER().

“dt” is an alias for a Derived Table. The alias can be anything other than DT. It could be x. Use derived tables when you are required to perform some complex query and result of inner query work as table for outer query. Derived tables are created on the fly with the help of a Select statement. The scope of derived table is limited to the outer Select query who created it. Derived table can’t be used further outside the scope of outer select query.

So, you have CTEs, derived tables, subqueries & temp tables to be used in different scenarios at your disposal.

OUTPUT:

SQL practice on DB-FIDDLE

SQL practice on DB-FIDDLE

Use MySQL V8.0 so that CTE (i.e. Common Table Expression) is supported.

SQL – SELF JOIN

One of the SQL join types is to join a table to itself. You will see more examples of this in the other scenarios based Q&As. This approach is not viable if you have too many columns as you have to include all columns in the join conditions.

OUTPUTS:

Note that NON EQUI JOIN e1.id < e2.id is a must NOT to join on rows with same id or the ones already joined again. Try without the non equi join & see what happens. Instead of “<” try “<>” and see what happens.

What if all columns are repeated?

It is important to note that if all the columns in a row gets repeated including the “id” column then some of the above approaches will NOT work.

Outputs:

Delete duplicate records:

Outputs:

More SQL scenarios interview Q&As

100+ SQL interview questions & answers with scenarios & examples.

Tags: ,

300+ Java & Big Data Interview FAQs

Java & Big Data Tutorials

Top