This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Running TornadoVM on NVIDIA Jetson Nano

7 minute read

Published: April 25, 2023

Did you know that TornadoVM can also run on ARM-based systems with NVIDIA GPUs? In this post, we will show how TornadoVM can be used on an NVIDIA Jetson Nano, a small, powerful computer designed for embedded artificial intelligence (AI) and machine learning (ML) applications.

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

Exploring Level Zero resources: repositories and purpose

2 minute read

Published: September 16, 2022

Sometimes, it is not clear which Level Zero repository is the right one for our needs. In this post, we will explain each of the Level Zero public resources and what they are intended to be.

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

TornadoVM Internals: Java APIs for Compiling Java methods to SPIR-V and running on GPUs via Level Zero

11 minute read

Published: September 01, 2022

This post shows how to use the internal APIs to interact directly with the TornadoVM JIT compiler interface and runtime system.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs

8 minute read

Published: May 25, 2022

Does share memory really impact performance if we measure end-to-end applications on GPUs? In this post, we try to answer this question.

Understanding Memory Allocation Size Limitations with Level Zero

6 minute read

Published: April 07, 2022

In this post we want to explore the memory capabilities of the Level Zero API, and, examine its constraints with respect to memory allocation.

Profiling OpenCL and SPIRV code from TornadoVM using VTune

7 minute read

Published: February 14, 2022

Profiling OpenCL and SPIRV code from TornadoVM using VTune : https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

Introduction to Level Zero API for Heterogeneous Programming

20 minute read

Published: June 09, 2021

Overview of the Intel Level-Zero API and a practical example to dispatch a SPIR-V kernel on the Intel HD Graphics: https://jjfumero.github.io/posts/2021/09/introduction-to-level-zero/

podcasts

portfolio

Portfolio item number 1

Published: July 16, 2025

Short description of portfolio item number 1

Portfolio item number 2

Published: July 16, 2025

Short description of portfolio item number 2

projects

publications

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Published in VEE 2017, Xian, China, 2017

Recommended citation: Juan Fumero, Michel Steuwer, Lukas Stadler, and Christophe Dubach. 2017. Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation. SIGPLAN Not. 52, 7 (April 2017), 60-73. DOI: https://doi.org/10.1145/3140607.3050761 https://dl.acm.org/citation.cfm?id=3050761

PhD Thesis: Accelerating Interpreted Programming Languages on GPUs with Just-In-Time and Runtime Optimisations

Published in The University of Edinburgh, UK, 2017

Recommended citation: Juan Fumero. PhD Thesis: Accelerating Interpreted Programming Languages on GPUs with Just-In-Time and Runtime Optimisations. The University of Edinburgh, UK. August 2017. https://www.era.lib.ed.ac.uk/handle/1842/28718

Towards Practical Heterogeneous Virtual Machines

Published in MoreVMs 2018, 2018

Heterogeneous VMs, JVM, Graal, GPUs

Recommended citation: James Clarkson, Juan Fumero, Michalis Papadimitriou, Maria Xekalaki, Christos Kotselidis. Towards Practical Heterogeneous Virtual Machines. MoreVMs 2018 https://2018.programming-conference.org/track/MoreVMs-2018#program

Enabling RISC-V support on the MaxineVM

Published in RISC-V Barcelona 2018, 2018

RISC-V, JVMs, MaxineVM

Recommended citation: Foivos S Zakkak, Juan Fumero and Christos Kotselidis. Enabling RISC-V support on the MaxineVM. RISC-V Workshop, Barcelona 2018 https://www.researchgate.net/publication/325113454_Enabling_RISC-V_support_on_MaxineVM

Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal

Published in ManLang 2018, Linz, Austria, 2018

Recommended citation: James Clarkson, Juan Fumero, Michalis Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, Christos Kotselidis, Mikel Lujan (The University of Manchester) Exploiting High-Performance Heterogeneous Hardware for Java Programs using Graal. ManLang 2018. https://www.researchgate.net/publication/327097904_Exploiting_High-Performance_Heterogeneous_Hardware_for_Java_Programs_using_Graal

ScootR: Scaling R Dataframes on Dataflow Systems

Published in SOCC 2018, California, 2018

Recommended citation: Andreas Kunft, Lukas Stadler, Daniele Bonetta, Cosmin Basca, Jens Meiners, Sebastian Breß, Tilmann Rabl, Juan Fumero, Volker Markl. ScootR: Scaling R Dataframes on Dataflow Systems. SOCC 2018 http://www.user.tu-berlin.de/akunft/paper/socc18-paper35.pdf

Using Compiler Snippets to Exploit Parallelism on Heterogeneous Hardware: A Java Reduction Case Study

Published in Boston, Massachusetts, United States, 2018

Recommended citation: Juan Fumero, Christos Kotselidis. Using Compiler Snippets to Exploit Parallelism on Heterogeneous Hardware: A Java Reduction Case Study. VMIL 2018 https://bit.ly/2Rh5t5c

Dynamic Application Reconfiguration on Heterogeneous Hardware

Published in VEE 2019, Providence, Rhode Island, United States, 2019

Recommended citation: Juan Fumero, Michail Papadimitriou, Foivos S. Zakkak, Maria Xekalaki, James Clarkson, and Christos Kotselidis. 2019. Dynamic application reconfiguration on heterogeneous hardware. In Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE 2019). ACM, New York, NY, USA, 165-178. DOI: https://doi.org/10.1145/3313808.3313819 https://github.com/jjfumero/jjfumero.github.io/blob/master/files/VEE2019_Fumero_Preprint.pdf

Towards Prototyping and Acceleration of Java Programs onto Intel FPGAs

Published in 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019

Recommended citation: M. Papadimitriou, J. Fumero, A. Stratikopoulos and C. Kotselidis. Towards Prototyping and Acceleration of Java Programs onto Intel FPGAs. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA, 2019 pp. 310-310. doi: 10.1109/FCCM.2019.00051 url: https://doi.ieeecomputersociety.org/10.1109/FCCM.2019.00051

Running Parallel Bytecode Interpreters on Heterogeneous Hardware

Published in MoreVMs 2020, Workshop collocated with Programming 2020. Porto, Portugal, 2020

Recommended citation: J. Fumero, A. Stratikopoulos and C. Kotselidis. Running Parallel Bytecode Interpreters on Heterogeneous Hardware. MoreVMs 2020.

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

Published in The Art, Science, and Engineering of Programming (Programming) 2021, 2021

Recommended citation: Michail Papadimitriou, Juan Fumero, Athanasios Stratikopoulos, Foivos Zakkak and Christos Kotselidis. Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs. Programming 2021.

Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation

Published in VEE 2021, Virtual Execution Environments, 2021

Recommended citation: Michail Papadimitriou, Juan Fumero, Athanasios Stratikopoulos and Christos Kotselidis. Automatically Exploiting the Memory Hierarchy of GPUs through Just-in-Time Compilation . VEE 2021.

Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes

Published in VEE 2021, Virtual Execution Environments, 2021

Recommended citation: Michail Papadimitriou, Eleni Markou, Juan Fumero, Athanasios Stratikopoulos, Florin Blanaru and Christos Kotselidis. Multiple-Tasks on Multiple-Devices (MTMD): Exploiting Concurrency in Heterogeneous Managed Runtimes . VEE 2021.

Enabling Pipeline Parallelism in Heterogeneous Managed Runtime Environments via Batch Processing

Published in VEE 2022, Virtual Execution Environments, 2022

Recommended citation: Florin Blanaru, Athanasios Stratikopoulos, Juan Fumero, Christos Kotselidis. VEE 2022.

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

Published in VLDB 2023, 49th International Conference on Very Large Data Bases, 2022

Recommended citation: Maria Xekalaki, Juan Fumero, Athanasios Stratikopoulos, Katerina Doka, Christos, Katsakioris, Constantinos Bitsakos, Nectarios Koziris, Christos Kotselidis. VLDB23

Cross-Language Interoperability of Heterogeneous Code

Published in MoreVMs 2023 - Collocated with Programming, 2023

Recommended citation: Athanasios Stratikopoulos, Florin Blanaru, Juan Fumero, Maria Xekalaki, Orion Papadakis, Christos Kotselidis. MoreVMs 2023

Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation

Published in Preprint, 2023

Recommended citation: Fumero, J., Rethy, G., Stratikopoulos, A., Foutris, N., & Kotselidis, C. (2023). Experiences in Building a Composable and Functional API for Runtime SPIR-V Code Generation. ArXiv. /abs/2305.09493

Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed Heaps

Published in MPLR 2023, 2023

Recommended citation: Juan Fumero, Florin Blanaru, Athanasios Stratikopoulos, Steve Dohrmann, Sandhya Viswanathan, Christos Kotselidis. Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed Heaps. MPLR 2023.

Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation

Published in VMIL 2023, 2023

Recommended citation: Fumero, J., Rethy, G., Stratikopoulos, A., Foutris, N., & Kotselidis, C. Beehive SPIR-V Toolkit: A Composable and Functional API for Runtime SPIR-V Code Generation. (VMIL 2023).

[BOOK] Programming Heterogeneous Hardware via Managed Runtime Systems

Published in SpringerBriefs in Computer Science, 2024

SpringerBriefs in Computer Science

Recommended citation: Juan Fumero, Athanasios Stratikopoulos, Christos Kotselidis. Programming Heterogeneous Hardware via Managed Runtime Systems. SpringerBriefs in Computer Science 2024. https://link.springer.com/book/10.1007/978-3-031-49559-5

Leveraging RISC-V Vectorization: Accelerating Java Programs with TornadoVM and OCK

Published in RISCV EU Summit 2025, 2025

Recommended citation: Juan Fumero Alfonso*, Athanasios Stratikopoulos, Colin Davidson, Harald van Dijk, Uwe Dolinsky, Michail Papadimitriou, Maria Xekalaki, Christos-Efthymios Kotselidis. Leveraging RISC-V Vectorization: Accelerating Java Programs with TornadoVM and OCK. RISCV EU Summit 2025.

service

supervision

talks

Invited Talk - FastR-Flink: A compiler based approach for distributed computing in R

Published: January 18, 2016

During the past few years R has become an important language for data analysis, data representation and visualization. R is a very expressive language which combines functional and dynamic aspects, with laziness and object oriented programming. However, the default Rimplementation is neither fast nor distributed, both features crucial for “big data” processing.

Invited Talk at Edinburgh University - FastR-Flink

Published: May 18, 2016

Invited Talk - OpenCL Just-In-Time Compilation for Dynamic Programming Languages

Published: May 03, 2017

In this talk we present a technique to automatically offload parts of the input program written in a dynamic language into OpenCL without any changes in the original source code. Our preliminary results show we achieve speedups of up to 150x when using the GPU (3) OpenCL JIT Compilation for Dynamic Programming Languages.

Towards Practical Heterogeneous Virtual Machines

Published: April 09, 2018

Heterogeneous computing emerged as a means to achieve higher performance and energy efficiency. However, this trend has been accompanied by changes in software development norms that do not necessarily favour programmers. A prime example is the two most popular heterogeneous programming languages, CUDA and OpenCL, which expose several low-level features to the API making them difficult to use by non-expert users.

Invited talk at MSR - Tornado VM: A Virtual Machine for Exploiting High-Performance Heterogeneous Hardware of Java Programs

Published: January 28, 2019

The proliferation of heterogeneous hardware in recent years means that every system we program is likely to include a mix of computing elements; each of these with different characteristics. This trend has been accompanied by changes in software development norms that do not necessarily favor programmers. A prime example is the two most popular heterogeneous programming languages, CUDA and OpenCL, which expose several low-level features to the API making them difficult to use by non-expert users.

Invited talk at ARM - Exploiting Heterogeneous Hardware from Managed Runtime Languages

Published: January 29, 2019

TornadoVM: A Virtual Machine for Exploiting High-Performance Heterogeneous Hardware of Java Programs

Published: July 30, 2019

Talk about TornadoVM given at the JVMLS 2019 workshop.

TornadoVM: A virtual machine for exploiting high performance heterogeneous hardware

Published: October 26, 2019

Talk about TornadoVM given at Joker<?> 2019 Conference.

TornadoVM: Java for GPUs and FPGAs @QCon-London

Published: March 03, 2020

Slides available here

Running Parallel Bytecode Interpreters on Heterogeneous Hardware

Published: April 08, 2020

Video

Rethinking Parallel Programming APIs: Towards Searching for the Gold API

Published: January 25, 2021

Link to the event

Transparent Heterogeneous Computing for Java via TornadoVM @ NYJavaSIG

Published: February 10, 2021

Slides available here

Level Up Your Java Performance with TornadoVM

Published: November 01, 2021

Summary

TornadoVM: Transparent Hardware Acceleration for Java…and Beyond!

Published: November 20, 2021

Video

TornadoVM: Transparent Hardware Acceleration for Java…and Beyond!

Published: December 08, 2021

Video

Boosting Performance of Java programs by Running on GPUs and FPGA via TornadoVM

Published: May 05, 2022

Summary

TornadoVM: Multi-Backend Hardware Acceleration Framework for Java

Published: March 14, 2023

Summary

From CPU to GPU and FPGAs: Supercharging Java Applications with TornadoVM

Published: August 07, 2023

Summary

Designing Parallel Programming APIs for Heterogeneous Hardware on top of Managed Runtime Systems

Published: March 13, 2024

Summary

teaching

Object Oriented Programming (Java course)

Tutoring, The University of Edinburgh, 2015

I was tutoring the Introduction to Object Obriented Programming through Java to groups of 8-10 of undergraduate students during Spring 2015 and 2016. We covered an introduction to topics such as Object Oriented Programming, inheritance, Java collections, Threads, concurrency and parallelism. We also covered some material about how the Java Virtual Machine works.

Juan Fumero, PhD

Sitemap

Pages

Posts

podcasts

portfolio

projects

publications

service

supervision

talks

teaching

youtube