18 SQL best practices interview Q&As

#1. Use uppercase for the keywords like SELECT, FROM, JOIN, GROUP BY, WHERE, etc. It’s also a good practice to use uppercase for the SQL functions like UPPER(col_name), COUNT(o.id), etc. Another key rule is that each clause such as SELECT, FROM, WHERE, GROUP BY, HAVING etc. should be in a new line. Proper structure improves readability of SQL queries.

Avoid: lower cases for the key words.

Favor: upper case as shown below.

#2. Use column & table aliases where it makes sense. For example, in the above example the column alias orders_count makes it more readable as in “COUNT(ord.id) as orders_count”. Use table aliases like cust, ord, etc shown above when you are joining multiple tables.

#3. Avoid select * as it hides the intentions behind your query. More importantly, the table can grow later with new columns added, which could impact the existing queries written with “SELECT * FROM table…..”.

Avoid:

Favor: naming the columns explicitly.

The selection of unnecessary columns in table that has 100+ columns or columns storing CLOB or BLOB object types can adversely impact the query performance.

#4. Favor joins over subqueries where possible for better performance.

#5. Favor GROUP BY or WHERE EXISTS (….) over DISTINCT as the GROUP BY takes place higher up in the order of execution in the logical plan, and it is easy to misuse DISTINCT. The order of operations in SQL are:

1) FROM (including JOIN, etc.)
2) WHERE
3) GROUP BY (can remove duplicates)
4) Aggregations (like SUM(), COUNT(), etc)
5) HAVING
6) Window functions
7) SELECT
8) DISTINCT (can remove duplicates)
9) UNION, INTERSECT, EXCEPT (can remove duplicates)
10) ORDER BY
11) OFFSET
12) LIMIT

The GROUP BY happens before the SELECT, which is a projection (i.e. choosing the columns & expressions) task, whereas the DISTINCT happens after the SELECT.

DISTINCT incorrect use #1

The incorrect use of DISTINCT can lead to erroneous results as shown below. The DISTINCT operation “happens after” the SELECT, hence you can no longer remove DISTINCT ratings because the window function was already calculated and projected.

Wrong:

Correct:

dt is a Derived Table.

DISTINCT incorrect use #2

Unnecessary JOINs are performed and when the data doubles the DISTINCT keyword is used to incorrectly fix it. It is correct to do SELECT … FROM employer WHERE EXISTS (SELECT… FROM employee) as opposed to joining and selecting the DISTINCT rows..

#6. Use Common Table Expression (i.e CTE) if your database supports it. CTEs are available on most modern databases. CTEs work like a Derived Table, but has 2 advantages.

1) CTEs make your query more readable.
2) CTEs can be defined once & reused multiple times.

Use CTEs instead of multiple sub or nested queries.

Learn more about CTEs.

#7. Beware with NULLs in equality or comparison operators. Internally a value of NULL is an unknown value and therefore SQL engines don’t equate an unknown value to another unknown value. You can detect the presence of null values being used in the tables being joined with an outer join.

#8. Write useful comments where the logic is complex, but don’t over do it. Use meaningful names & aliases to improve readability.

#9. Starting the WHERE clause with 1=1 allows you to comment certain conditions for debugging purpose.

#10. Table primary keys, indexes or data partitions help retrieve information faster and more efficiently. Full table & data scans are expensive, and can be avoided with proper keys, indexes & partitions.

#11. Favor using wildcards at the end (like ‘p%’) as opposed to the beginning (like ‘%d’) as it is faster & more efficient. At times you want to search by the last numbers as in last 4 digits of a credit card number or a phone number. In this scenario have an additional column with the reversed card number or phone number.

#12. Limit the number of records to be returned from the database, when you are testing the correctness of your query. You can use the SELECT TOP 10 or LIMIT 10 to restrict the number of rows returned as a sample.

#13. Favor UNION ALL over UNION , when you know that the two tables being combined don’t have any duplicates. The UNION ALL is more efficient as it does not have to perform a deduplication operation.

#14. Create joins with INNER JOIN but not with WHERE as joins with WHERE can create a cartesian product or a CROSS JOIN. In a CROSS JOIN, all possible combinations of the variables are created.

Avoid:

Correct:

#15. Use WHERE instead of HAVING to filter records as WHERE is executed before HAVING as per the order of execution shown below. The WHERE is more efficient as you restrict the number of records read from the database. HAVING should only be used when filtering on an aggregated fields.

1) FROM (including JOIN, etc.)
2) WHERE
3) GROUP BY (can remove duplicates)
4) Aggregations (like SUM(), COUNT(), etc)
5) HAVING
6) Window functions
7) SELECT
8) DISTINCT (can remove duplicates)
9) UNION, INTERSECT, EXCEPT (can remove duplicates)
10) ORDER BY
11) OFFSET
12) LIMIT

#16. At times it is better to split a complex query . In some cases joins will improve your query performance, but too many JOINs can result in memory or performance issues. You can split your complex queries with an initial SELECT followed by a sequence of UPDATE/INSERT statements. You need an orchestrator to execute these statements in the correct order. At times this approach may come with a readability/performances trade-off. Assess the scenarios & weigh the pros vs cons.

Instead Of:

You could try:

“dt” is a shorthand for the Derived Table.

#17. Use meaningful concise names for the table & column names. Most organisations maintain a data dictionary with an abbreviated naming convention to be adhered to. They also have a review & approval process in place where the Data assets are reviewed by the Data modellers, Data stewards, Leads, Product owners, etc.

#18. Remove unnecessary ORDER BY clauses. If you are using CTEs or subqueries you don’t need to have ORDER BY in them. Often it is fine to have the ORDER BY in the final query. This not only improves performance, but also the readability.


Java & Big Data Interview FAQs

Java Key Areas Interview Q&As

800+ Java Interview Q&As

Java & Big Data Tutorials

Top