05: 7 Java FP (lambda expressions) real life examples in wrangling normal & big data

This post extends Transforming your thinking from OOP to FP. In Big-data, functional programming is prevalent when working with data sets. For example, writing a Spark job to work with RDDs (Resilient Distributed Data sets).

In Imperative (E.g. OOP, procedural programming, etc) programming you can say

where you are assigning x to x + 5. if x were to b 2, then after assignment it becomes 7 (i.e. 2 + 5)

In functional programming (FP), you can’t say “x = x + 5” why? if x were to be 2, “2 = 2 + 5” is wrong. FP does not have assignment statements. FP is all about computation as the evaluation of “mathematical functions” and avoids changing-states and mutablity. In FP, you need to say

where f(x) and f(x,y) are functions. Similarly, in the example below “(el1, el2) -> el1 + “,” + el2″ is a lambda expression in FP. Where “el1” and “el2” are consecutive elements in a given list.

Example 1: Reducing a list of strings to CSV

stream -> reduce -> get

Output

Example 2: Reducing a list of Integers to string (e.g. CSV)

stream -> map -> reduce -> get

Example 3: Converting a List of unique objects to a map

Key=name -> value=Employee

Output

Example 4: Converting a List of non unique objects to a map

Key=name -> value=List<Employee>

Output

Example 5: Converting a Map keys to a List, sorted by values

entrySet -> stream -> sorted -> map -> collect

Output

Now, to sort by “length” of the faculty name

Output

FP is more memory intensive than imperative programming because in FP data is not overwritten, but sequences of versions are created to represent the data modification. Nowadays, both the memory & disk are cheap. FP gives the programmer a lot more control about wrangling the data. Very useful in big data for functions like map, flatMap, reduce, combine, sort, etc. The “map” applies a given function to every data record on different machines in a cluster. This can be run in parallel. The “reduce” combines the individual results on different machines “by “applying a given function” to every data to reach a final result.

Example 6: Sum the list of numbers across the Hadoop cluster with Apache Spark

Example 7: Counting the number of blank lines in a given text input with Apache Spark.

More detailed explanation at: Apache Spark interview questions & answers


800+ Java Interview Q&As Menu

Top