Memory Handling in Pentaho Data Integration

Published by

on


[Update 2023]: Please check the JAVA/JVM OPTIONS if you are using Java version 11 or higher. Pentaho 9 and above supports Java version 11. An updated post is now available.

  1. Introduction
  2. Solution
    1. JAVA_OPTIONS for standard on-premise installation
    2. JAVA_OPTIONS when running Pentaho on Java-11 or higher
    3. JAVA_OPTIONS when running Pentaho on Kubernetes

Introduction

Sometimes while loading huge volume of data using Kettle, pentaho might throw an error like Out of Memory. It is mainly because there is no memory left in the server to execute that job and pentaho clearly states the same.

Suppose you are loading a text file of size 20GB to a database table (using the Table Output Step) and the server is of memory 15GB. In case the process (Job that is running) takes memory more than or equal to 15GB, Pentaho process will result in getting killed or an error will be thrown. So how do we tackle it. Further down the blog post, I would also provide updated contents to handle Pentaho memory on Java-11 or higher and also when running on containerised platforms.

Solution

JAVA_OPTIONS for standard on-premise installation

One possible way is to edit the memory parameters defined in the “spoon.bat” or “spoon.sh” files and increase it to a certain limit.

Step-1: Open “spoon.bat” or “spoon.sh” in notepad or any text editor

Step-2:  Increase the value of parameter PENTAHO_DI_JAVA_OPTIONS “-Xmx”.

For windows system (spoon.bat)

Kettle - Spoon BAT file memory configuration
Kettle – Spoon BAT file memory configuration

For Linux/Unix system (spoon.sh)

Kettle - Spoon SH file memory configuration
Kettle – Spoon SH file memory configuration

Change the values of -Xmx512m to some values greater may be like -Xmx1024m. Make sure you define the memory size in MB and keep it in a multiple of 2.

There are multiple cases where the memory setup is absolutely perfect, but still there is an out of memory error. Its might be due to some other process taking up the memory space. So increasing the memory configuration does help.

JAVA_OPTIONS when running Pentaho on Java-11 or higher

Subscribe to continue reading

Subscribe to get access to the rest of this post and other subscriber-only content.