SQL window function for analytics interview questions answered

Data Engineers & analysts must know the Window functions as it is a powerful tool to analyse the data. This post compliments “Apache Spark window functions” focus on using SQL on MySQL server covered. You can also apply this on Big Data SQL engines like Hive & impala, but the syntax can be slightly different, but the idea is the same.

What is a window function?

Similar to GROUP BY aggregated functions, window functions perform aggregations over a set of rows. Window function will do this for each row in the result set. The keyword OVER signals that it is a window function.

Given Data – Employee salary by location

Let’s start analysing highest salary by location, salary difference by location, running total, etc using the window functions.

Let’s do it on https://www.db-fiddle.com/ using “MySql 8.0”

SQL Window Function

Rank by location

There are different ranking functions as in rank() & dense_rank().

Output

As you can see, when the ranks are same, the rank 2 is skipped for “AU” and it goes to 3. If you don’t want the ranks to skip use dense_rank().

Output

What if you have ties, and still want to have sequential numbering so that you pick a single ranked 1. This is where row_number

OUTPUT

MAX salary by location

What if you want print the max salary next to each salary?

OUTPUT

Difference in salary

How about printing the difference in salaries by location?

OUTPUT

Running total of salaries by location

OUTPUT

Finally, getting the running total for all the employees just remove the “partitionBy”

OUTPUT

Examples with preceding, following, & unbounded

SQL Window function, source: https://www.sqltutorial.org/sql-window-functions/

The below example was created in Teradata with CTE. 1 PRECEDING AND 1 FOLLOWING to get the average of 1 preceding row, current row and 1 following row.

Output:

101 is (5 + 3) / 2 = 4
102 is (5 + 3 + 6) / 3 = 14 / 3 = 4.67
103 is (3 + 6 + 13) / 3 = 22 / 3 = 7.33
104 is (6 + 13 + 10) / 3 = 29 / 3 = 9.67
105 is (13 + 10) / 2 = 23 / 2 = 11.5

Q. How will you do a cumulative total or running total?
A. Use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Output:

Q. How will you print the group total in each row?
A. Use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED following.

Output:

Let’s add “PARTITION BY” to group by “product_id”

Output:

What is a QUALIFY function?

QUALIFY is a proprietary extension to filter the result of a Windowed Aggregate Function. It is used to filter out the result set same way as WHERE & HAVING clauses.

  1. FROM: create the basic result set
  2. WHERE: remove rows from the previous result set
  3. GROUP BY: apply aggregate functions on the previous result set
  4. HAVING: remove rows from the previous result set
  5. OVER: apply windowed aggregate functions on the previous result set
  6. QUALIFY: remove rows from the previous result set

Output:


800+ Java & Big Data Interview Q&As

Top