Posts by Tags

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

How to Fix CUDA GCC Unsupported Versions on Linux

1 minute read

Published: January 16, 2025

How to Fix CUDA GCC Unsupported Versions on Linux

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Running TornadoVM on NVIDIA Jetson Nano

7 minute read

Published: April 25, 2023

Did you know that TornadoVM can also run on ARM-based systems with NVIDIA GPUs? In this post, we will show how TornadoVM can be used on an NVIDIA Jetson Nano, a small, powerful computer designed for embedded artificial intelligence (AI) and machine learning (ML) applications.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

How to enable NVIDIA Nsight Compute CLI in Fedora

1 minute read

Published: July 04, 2025

How to enable NVIDIA Nsight Compute CLI in Fedora

How to disable auto-update in Fedora

less than 1 minute read

Published: June 25, 2025

Linux command to disable Fedora’s automatic updates at restart

Fixing libcurl conflicts in Fedora 41

2 minute read

Published: January 20, 2025

Fixing libcurl conflicts in Fedora 41

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

How to Fix CUDA GCC Unsupported Versions on Linux

1 minute read

Published: January 16, 2025

How to Fix CUDA GCC Unsupported Versions on Linux

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

Introduction to Level Zero API for Heterogeneous Programming

20 minute read

Published: June 09, 2021

Overview of the Intel Level-Zero API and a practical example to dispatch a SPIR-V kernel on the Intel HD Graphics: https://jjfumero.github.io/posts/2021/09/introduction-to-level-zero/

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Profiling OpenCL and SPIRV code from TornadoVM using VTune

7 minute read

Published: February 14, 2022

Profiling OpenCL and SPIRV code from TornadoVM using VTune : https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

Introduction to Level Zero API for Heterogeneous Programming

20 minute read

Published: June 09, 2021

Overview of the Intel Level-Zero API and a practical example to dispatch a SPIR-V kernel on the Intel HD Graphics: https://jjfumero.github.io/posts/2021/09/introduction-to-level-zero/

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

TornadoVM Internals: Java APIs for Compiling Java methods to SPIR-V and running on GPUs via Level Zero

11 minute read

Published: September 01, 2022

This post shows how to use the internal APIs to interact directly with the TornadoVM JIT compiler interface and runtime system.

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Profiling OpenCL and SPIRV code from TornadoVM using VTune

7 minute read

Published: February 14, 2022

Profiling OpenCL and SPIRV code from TornadoVM using VTune : https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Building JDK with HSDIS on Linux

5 minute read

Published: February 14, 2025

Learn how to build a JDK with the HotSpot Disassembler (HSDIS) plugin enabled on Linux to inspect the JVM’s JIT-compiled assembly code.

TornadoVM Internals: Java APIs for Compiling Java methods to SPIR-V and running on GPUs via Level Zero

11 minute read

Published: September 01, 2022

This post shows how to use the internal APIs to interact directly with the TornadoVM JIT compiler interface and runtime system.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Configuring Unsloth on Linux for LLM Fine Tuning

1 minute read

Published: April 17, 2025

This guide details the configuration of Unsloth to build fine-tuned LLM models on NVIDIA GPUs on Linux systems.

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Exploring Level Zero resources: repositories and purpose

2 minute read

Published: September 16, 2022

Sometimes, it is not clear which Level Zero repository is the right one for our needs. In this post, we will explain each of the Level Zero public resources and what they are intended to be.

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs

8 minute read

Published: May 25, 2022

Does share memory really impact performance if we measure end-to-end applications on GPUs? In this post, we try to answer this question.

Understanding Memory Allocation Size Limitations with Level Zero

6 minute read

Published: April 07, 2022

In this post we want to explore the memory capabilities of the Level Zero API, and, examine its constraints with respect to memory allocation.

How to enable NVIDIA Nsight Compute CLI in Fedora

1 minute read

Published: July 04, 2025

How to enable NVIDIA Nsight Compute CLI in Fedora

How to disable auto-update in Fedora

less than 1 minute read

Published: June 25, 2025

Linux command to disable Fedora’s automatic updates at restart

How to Fix CUDA GCC Unsupported Versions on Linux

1 minute read

Published: January 16, 2025

How to Fix CUDA GCC Unsupported Versions on Linux

Running TornadoVM on NVIDIA Jetson Nano

7 minute read

Published: April 25, 2023

Did you know that TornadoVM can also run on ARM-based systems with NVIDIA GPUs? In this post, we will show how TornadoVM can be used on an NVIDIA Jetson Nano, a small, powerful computer designed for embedded artificial intelligence (AI) and machine learning (ML) applications.

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs

8 minute read

Published: May 25, 2022

Does share memory really impact performance if we measure end-to-end applications on GPUs? In this post, we try to answer this question.

Understanding Memory Allocation Size Limitations with Level Zero

6 minute read

Published: April 07, 2022

In this post we want to explore the memory capabilities of the Level Zero API, and, examine its constraints with respect to memory allocation.

Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs

8 minute read

Published: May 25, 2022

Does share memory really impact performance if we measure end-to-end applications on GPUs? In this post, we try to answer this question.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled

6 minute read

Published: December 10, 2023

Installing the NVIDIA Drivers and CUDA 12.3 on Fedora 39 with Secure Boot Enabled.

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Running TornadoVM on NVIDIA Jetson Nano

7 minute read

Published: April 25, 2023

Did you know that TornadoVM can also run on ARM-based systems with NVIDIA GPUs? In this post, we will show how TornadoVM can be used on an NVIDIA Jetson Nano, a small, powerful computer designed for embedded artificial intelligence (AI) and machine learning (ML) applications.

How to enable NVIDIA Nsight Compute CLI in Fedora

1 minute read

Published: July 04, 2025

How to enable NVIDIA Nsight Compute CLI in Fedora

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Installing CUDA, OpenCL and Level Zero in OpenSUSE Leap 15

6 minute read

Published: September 09, 2022

In this post, we show how to install the NVIDIA drivers to get access to CUDA and OpenCL parallel programming frameworks and utilities for NVIDIA GPUs. We also show how to install the Intel compute-runtime drivers for accessing, via OpenCL and Level Zero, Intel Integrated Graphics.

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Configuration of the NVIDIA and Intel GPU drivers for RHEL9

5 minute read

Published: May 19, 2023

This post shows the installation steps to obtain NVIDIA CUDA and Intel OpenCL and Level Zero runtimes to run applications on GPUs with RHEL.

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Exploring Level Zero resources: repositories and purpose

2 minute read

Published: September 16, 2022

Sometimes, it is not clear which Level Zero repository is the right one for our needs. In this post, we will explain each of the Level Zero public resources and what they are intended to be.

Book Review: JVM Performance Engineering

5 minute read

Published: June 21, 2024

TornadoVM Internals: Java APIs for Compiling Java methods to SPIR-V and running on GPUs via Level Zero

11 minute read

Published: September 01, 2022

This post shows how to use the internal APIs to interact directly with the TornadoVM JIT compiler interface and runtime system.

Exploring Level Zero resources: repositories and purpose

2 minute read

Published: September 16, 2022

Sometimes, it is not clear which Level Zero repository is the right one for our needs. In this post, we will explain each of the Level Zero public resources and what they are intended to be.

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

Accessible Dynamic SPIR-V Code Generation from Java

8 minute read

Published: November 24, 2023

Dynamic SPIR-V Code Generation from Java. Why do we need this and how can be used?

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

Babylon OpenJDK: A Guide for Beginners and Comparison with TornadoVM

27 minute read

Published: February 07, 2025

Babylon and Programming for GPUs: introductions and comparisons with TornadoVM

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Can TornadoVM run Matrix Multiply faster than OpenCL Native?

26 minute read

Published: December 17, 2024

This article explores how TornadoVM, a Java parallel programming framework, can outperform OpenCL code on GPUs using the Matrix Multiplication application as an example.

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Running TornadoVM on CPUs and FPGAs via oneAPI

12 minute read

Published: May 09, 2024

This post shows the main steps to install and run TornadoVM on CPUs and FPGAs using the Intel oneAPI runtime for OpenCL.

Multi-device & Multi-backend TornadoVM

20 minute read

Published: March 22, 2024

This post shows, via examples, how developers can benefit from these features, and reason about performance using the TornadoVM profiler to help us tune our applications.

The TornadoVM Programming Model Explained

16 minute read

Published: February 23, 2024

You are a Java developer and you want to access GPUs? In this post I explain how by using TornadoVM.

Running TornadoVM within IntelliJ

4 minute read

Published: February 05, 2024

Running Java applications from existing IDEs can be a cumbersome process, especially if we need to specify shared libraries. In this post, I will explain how to get access to NVIDIA and Intel-integrated GPUs from IntelliJ using TornadoVM.

Running TornadoVM on NVIDIA Jetson Nano

7 minute read

Published: April 25, 2023

Did you know that TornadoVM can also run on ARM-based systems with NVIDIA GPUs? In this post, we will show how TornadoVM can be used on an NVIDIA Jetson Nano, a small, powerful computer designed for embedded artificial intelligence (AI) and machine learning (ML) applications.

Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous Hardware

21 minute read

Published: November 25, 2022

Exploiting heterogeneous hardware for Big Data workloads is usually done by introducing new APIs, resulting in more complex programs to develop, understand, and maintain. But, what if we do not change/extend the original programming model? Is it possible? This post discusses a new approach to do so.

TornadoVM Internals: Java APIs for Compiling Java methods to SPIR-V and running on GPUs via Level Zero

11 minute read

Published: September 01, 2022

This post shows how to use the internal APIs to interact directly with the TornadoVM JIT compiler interface and runtime system.

Running Java Programs on XPUs with TornadoVM via Docker

9 minute read

Published: July 08, 2022

In this post, we will show how to launch and accelerate Java programs on heterogeneous hardware via TornadoVM with minimal configuration using pre-built Docker images

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Profiling OpenCL and SPIRV code from TornadoVM using VTune

7 minute read

Published: February 14, 2022

Profiling OpenCL and SPIRV code from TornadoVM using VTune : https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Unified Shared Memory: Friend or Fue?

16 minute read

Published: October 19, 2023

Unified Shared Memory: Friend or Fue? Understanding the Implications of Unified Memory on Managed

Accelerating Java programs on RISC-V CPUs with Vector Instructions

23 minute read

Published: April 04, 2025

Learn how to accelerate performance on RISC-V CPUs using TornadoVM & vector instructions

How to Fix CUDA GCC Unsupported Versions on Linux

1 minute read

Published: January 16, 2025

How to Fix CUDA GCC Unsupported Versions on Linux

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Setting Up WSL for GPU Compute

5 minute read

Published: January 14, 2025

Learn how to set up WSL for GPU compute and unlock the potential of your machine for tasks like AI and scientific computing!

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Running TornadoVM on Intel GPUs using Windows Subsystem for Linux (WSL) for Windows 11

4 minute read

Published: June 29, 2022

In this post, I will show you how we can enable TornadoVM to run on Intel HD Graphics via the OpenCL and SPIR-V Backends within WSL using Windows 11

Fixing libcurl conflicts in Fedora 41

2 minute read

Published: January 20, 2025

Fixing libcurl conflicts in Fedora 41

Configuring Unsloth on Linux for LLM Fine Tuning

1 minute read

Published: April 17, 2025

This guide details the configuration of Unsloth to build fine-tuned LLM models on NVIDIA GPUs on Linux systems.

Building JDK with HSDIS on Linux

5 minute read

Published: February 14, 2025

Learn how to build a JDK with the HotSpot Disassembler (HSDIS) plugin enabled on Linux to inspect the JVM’s JIT-compiled assembly code.

Fixing libcurl conflicts in Fedora 41

2 minute read

Published: January 20, 2025

Fixing libcurl conflicts in Fedora 41

Measuring Kernel Time and Data Transfers with Level Zero

13 minute read

Published: September 14, 2021

Measuring Kernel Time and Data Transfers with Level Zero : https://jjfumero.github.io/posts/2021/09/timers-with-level-zero/

Introduction to Level Zero API for Heterogeneous Programming

20 minute read

Published: June 09, 2021

Overview of the Intel Level-Zero API and a practical example to dispatch a SPIR-V kernel on the Intel HD Graphics: https://jjfumero.github.io/posts/2021/09/introduction-to-level-zero/

Exploring Level Zero resources: repositories and purpose

2 minute read

Published: September 16, 2022

Sometimes, it is not clear which Level Zero repository is the right one for our needs. In this post, we will explain each of the Level Zero public resources and what they are intended to be.

Overall Performance of Unified Shared Memory Types with Level Zero on Intel Integrated GPUs

8 minute read

Published: May 25, 2022

Does share memory really impact performance if we measure end-to-end applications on GPUs? In this post, we try to answer this question.

Understanding Memory Allocation Size Limitations with Level Zero

6 minute read

Published: April 07, 2022

In this post we want to explore the memory capabilities of the Level Zero API, and, examine its constraints with respect to memory allocation.

Profiling OpenCL and SPIRV code from TornadoVM using VTune

7 minute read

Published: February 14, 2022

Profiling OpenCL and SPIRV code from TornadoVM using VTune : https://jjfumero.github.io/posts/2022/02/profiling-tornadovm-with-intel-vtune/

Using oneAPI Construction Kit and TornadoVM to accelerate Java Programs on x86, ARM and RISC-V CPUs

26 minute read

Published: September 10, 2024

Running TornadoVM via the oneAPI Construction Kit for Intel, ARM and RISC-V CPUs.

Configuring Unsloth on Linux for LLM Fine Tuning

1 minute read

Published: April 17, 2025

This guide details the configuration of Unsloth to build fine-tuned LLM models on NVIDIA GPUs on Linux systems.

Juan Fumero, PhD

Posts by Tags

ARM

Academic Paper

Apache Flink

Babylon

Benchmarking

Book

CPUs

CUDA

Co-design Approach

Code Generation

Comparison

Compiler Optimizations

Concurrency

Configuration

Deep Learning

Docker

Drivers

FPGAs

Fedora

Fedora39

GCC

GPGPU

GPU Compute

GPU Drivers

GPU Profiling

GPUs

HAT

Hardware Acceleration

Hardware Accelerators

Heterogeneous Programming

Installation

Intel Compute Runtime

Intel GPUs

Intel HD Graphics

Intel Integrated GPUs

Intel Level Zero

Intel OpenCL

Intel VTune

Intel oneAPI

IntelliJ

JDK

JIT Compilation

JVM

Java

LLM

Level Zero

Linux

Machine Learning

Managed Runtime Systems

MaxineVM

Memory Allocation

Memory Types

Multi-Backend

NVIDIA

NVIDIA CUDA

NVIDIA Jetson Nano

NVIDIA Nsight Compute CLI

Native