We know that the following code snippets in Spark will write each JavaRDD element to a single file
1 2 3 | employeesRdd.saveAsTextFile(pathToHdfs) |
What if you want to write each employee history to a separate file? Step 1: Create a JavaPairRDD from JavaRDD
1 2 3 4 5 6 7 8 9 | JavaPairRDD<String, String> empPairRdd = employeesRdd.mapToPair(new PairFunction<Employee, String, String>() { @Override public Tuple2<String, String> call(Employee t) throws Exception { return new Tuple2<String, String>(t.getEmpId, t.getEmpHistoryDump()); } }); |
…