During the past few years R has become an important language for data analysis, data representation and visualization. R is a very expressive language which combines functional and dynamic aspects, with laziness and object oriented programming. However, the default Rimplementation is neither fast nor distributed, both features crucial for “big data” processing.
In this talk I will present FastR-Flink, a compiler based on Oracle‘s R implementation FastR with support for some constructs of Apache Flink, a Java/Scala framework for distributed data processing. The Apache Flink constructs such as map, reduce or filter are integrated at the compiler level to allow the execution of distributed stream and batch data processing applications directly from the R programming language.
This work has been initiated at Oracle Labs during my recent internship with them in Linz, Austria.