Blog Archives
1 2

10: Solving AlreadyBeingCreatedException & LeaseExpiredException thrown from your Spark jobs

What is wrong with the following Spark code snippet? You are likely to get AlreadyBeingCreatedException & LeaseExpiredException thrown as multiple executors try to either create or append to the same...



11. What are part- files in Hadoop & 6 ways to merge them

What are the part-xxxx files generated by Hadoop? When you invoke rdd.saveAsTextFile(…) or rdd.saveAsNewAPIHadoopFile(…) from Spark you will get part- files. When you do “INSERT INTO” … Read more ›...



12: XML Processing in Spark with XmlInputFormat

Step 1: Read the XML snippet in between the tags “<Record>”. Upload this file to HDFS “/user/cloudera/xml/orders.xml”. Step 2: You need the XmlInputFormat class as shown below. … Read more...



13: Spark inner & outer joins in Java with JavaPairRDDs

RDD inner join via JavaPairRDD Here is an inner join displaying all the orders with line items. Outputs: RDD left outer join with filtering via JavaPairRDD Here is a left...



14: Spark joins with SQLContext & JavaPairRDD

This extends the last tutorial where Spark inner & outer joins in Java with JavaPairRDDs. In this tutorial let’s read the orders via a Hive table using SQLContext & …...



15: Spark joins with Dataframes & SQLContext

Create LineItems Hive table Step 1: Create a file “line-item1.txt” on HDFS under “/user/cloudera/learn-hdfs/lineitems” as Step 2: You create a Hive table “lineitems” … Read more ›...



1 2

Java Developer Interview Q&As

800+ Java Interview Q&As

Top