Prerequisite: Extends Databricks – Spark Window functions.
Problem: Convert the below table
|
+------+--------+---------+-------------+ |emp_id|emp_name| emp_city|emp_expertise| +------+--------+---------+-------------+ | 1| John| Sydney| Java| | 2| Peter|Melbourne| Scala| | 3| Sam| Brisbane| Python| | 4| David|Melbourne| Python| | 5| Elliot| Sydney| Java| +------+--------+---------+-------------+ |
to
|
+-------------+----------------------------------------------------------+ |Column |Content | +-------------+----------------------------------------------------------+ |emp_id |[[1, 1], [3, 1], [5, 1], [4, 1], [2, 1]] | |emp_name |[[Elliot, 1], [John, 1], [Sam, 1], [Peter, 1], [David, 1]]| |emp_city |[[Sydney, 2], [Melbourne, 2], [Brisbane, 1]] | |emp_expertise|[[Python, 2], [Java, 2], [Scala, 1]] | +-------------+----------------------------------------------------------+ |
Where each column is counted for its occurrence.…