During the past few years R has become an important language for data analysis, data representation and visualization. R is a very expressive language which combines functional and dynamic aspects, with laziness and object oriented programming. However, the default Rimplementation is neither fast nor distributed, both features crucial for “big data” processing.
The talk is divided in two parts. In the first part I will present FastR-Flink, a compiler based on Oracle‘s R implementation FastR with support for some constructs of Apache Flink, a Java/Scala framework for distributed data processing. The Apache Flink constructs such as map, reduce or filter are integrated at the compiler level to allow the execution of distributed stream and batch data processing applications directly from the R programming language.
In the second part, I will show a demo of some R workloads working on a distributed cluster using VirtualBox. I will show a litle bit of R code and how R intertacts with Flink.
This work has been initiated at Oracle Labs during my recent internship with them in Linz, Austria.