SQL analytic functions interview questions – Part 1

Data Engineers & analysts must know the analytic functions applied over a window of rows is a powerful tool to analyse the data. This post compliments “Apache Spark window functions” by focusing on MySQL server. You can also apply this on Big Data SQL engines like Hive & impala, but the syntax can be slightly different, but the idea is the same.

What is an analytic function?

Similar to GROUP BY aggregated functions, analytic functions perform aggregations over a set of rows. Analytic function will do this for each row in the result set. The keyword OVER signals that it is an analytic function.

Analytic function construct

The analytic function names can be RANK( ), DENSE_RANK( ), ROW_NUMBER( ), MIN(column), MAX(column), SUM(column), COUNT( ), AVG(column), FIRST_VALUE(column), LAST_VALUE(column), NTH_VALUE(column, n), LEAD(return_value ,offset [,default]), LAG(return_value ,offset [,default]), NTILE(n), CUME_DIST( ), PERCENT_RANK( ), etc.

These functions can be applied OVER a window of rows that are PARTITIONed (i.e. grouped) by a column(s)/expressions & ordered by columns/expressions. There is also a windowing clause specified as ROWS and ROWS BETWEEN.

Use analytic functions to compute:

1) Running totals with sum(..). Running totals are used with time series data to see how certain performance indicators like sales, revenues, costs, profits & budgets are (or will be) developing over time.

2) Moving averages with avg(…) & ROWS BETWEEN. The moving average is a time series technique for determining trends in data. It is also known as rolling means, rolling averages, or running averages as they are calculated as the mean of the current and a specified number of immediately preceding values for each point in time. For example, calculating the moving average of stock prices.

3) Nth highest/lowest record from the table with rank()/dense_rank(): Management likes to see rankings for absolutely anything: products sold, employee salaries, revenues by location, etc.

4) Cumulative distribution of values within a given partition with cume_dist(). At times you want to make a report that contains the top or bottom x% values from a data set e.g. top 5% sales staffs by net sales.

5) First & last value of a column from a group with first_value()/last_value().

6) Calculating delta values by reading values before & after specified number of rows with lead()/lag(). One of data analysts’ most common tasks is calculating delta values. This is usually required when calculating day-to-day, month-to-month, quarter-to-quarter, or year-to-year changes.

Let’s look at examples as there is no better way to learn. You can build the sample data via DB Fiddle as explained in 14+ SQL beginner interview Q&As.

Given Data – Employee salary by location

Let’s start analysing highest salary by location, salary difference by location, running total, etc using the window functions.

Let’s do it on https://www.db-fiddle.com/ using “MySql 8.0”

SQL Window Function

Rank by location

There are different ranking functions as in rank() & dense_rank().

Output

As you can see, when the ranks are same, the rank 2 is skipped for “AU” and it goes to 3. If you don’t want the ranks to skip use dense_rank().

Output

What if you have ties, and still want to have sequential numbering so that you pick a single ranked 1. This is where row_number

OUTPUT

MAX salary by location

What if you want print the max salary next to each salary?

OUTPUT

Difference in salary

How about printing the difference in salaries by location?

OUTPUT

Running total of salaries by location

OUTPUT

Finally, getting the running total for all the employees just remove the “partition by”

OUTPUT

Days between first purchase

The FIRST_VALUE(order_date) function returns the first order_date for each customer as it is partitioned by cust_id.

More SQL analytic functions Q&As

Part 2 – SQL analytic functions – ROWS BETWEEN with preceding, following, & unbounded.

Part 3 – SQL analytic functions – cume_dist(), percent_rank() and ntile(n).


Java developer & architect Q&As

Java developers Q&As

Top