10 Spark SQL Interview Q&As

Q1. What is Spark SQL?
A1. Apache Spark SQL is a module for structured data processing in Spark. Spark SQL integrates relational processing (i.e. SQL) with Spark’s functional programming using Scala, Java, etc weave SQL queries with Dataframes/Datasets based transformations. It provides support for various data sources as shown below:

Spark SQL in Spark Eco System

Q2. What libraries do Spark SQL have?
A2.

1. Data Source API

This library has built-in support for various Datasources shown above. This library can be used with various datasources for loading and storing structured data. It has built-in support for Hive, Avro, JSON, JDBC, Parquet, Elastic Search, MySQL, etc.

2. Dataframe API

A DataFrame is a distributed collection of data organised into structured named column. It is equivalent to a relational table in SQL used for storing data into tables.

3. SQLInterpreter And Optimiser

SQL Interpreter and Optimizer are functional programming constructed in Scala for supporting cost based and rules based optimization to make the queries run faster than RDDs.

4. SQL Service

SQL Service is an entry point for working with structured data in Spark. It enables you to create DataFrame objects as well as the execution of SQL queries.

Q3. How will you go about enabling Hive support in Spark 2.0?
A3.

Q4. How will you go about using Spark SQL with Spark 2.0 SparkSession?
A4.

You can also “printSchema” and perform transformations

Q5. How will you go about saving & reading from Hive table with SparkSession?
A5.

If you use the Scala implicits, you do not need to prefix with “spark” as in “spark.sql(“………….”)”

Q6. How will you display the number of employees at different age groups?
A6.

Q7. How will you create a temporary view of a DataFrame?
A7.

Q8. How will you use a DataSet API with Spark SQL?
A8.

Q9. How will you be reading json & parquet files?
A9.

json
Parquet

Q10. What is a Spatk SQL’s UDF?
A10. Spark SQL’s User-Defined Functions (UDFs) define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.


Java & Big Data Interview FAQs

Java Key Areas Interview Q&As

800+ Java Interview Q&As

Java & Big Data Tutorials

Top