Using “Copy rows to result” in Pentaho Data Integration

Published by

on


[Update 2023]: This blog now has an alternative solution that you can also use to achieve similar result in minimum number of steps.

  1. Introduction
  2. Problem Statement / Scenario
  3. Solution-A: Using “Copy rows to result”
    1. Step-1: Create a Job with two Transformation
    2. Step-2: Read contents to memory
    3. Step-3: Configuring the data read from previous steps
    4. Step-4: Using the parameter in the final output step
    5. Codebase
  4. Solution-B: Using “Get rows from Result”
    1. Step-1: Create a transformation to copy the rows to memory
    2. Step-2: Create a transformation to read the data from memory and generate the files
    3. Step-3: Create a Job and execute
    4. Summary

Introduction

There arise several situations in Pentaho Data Integration, where we would need to execute a single piece of code for every single data rows coming from the input stream. Each row generates a different set of output. So in order to accomplish this, Pentaho has a step named “Copy rows to result“.

This step allows you to transfer rows of data (in memory) to the next transformation (or job entry) in a job via an internal result row set. It can be used by the Get rows from result step and some job entries that allow to process the internal result row set. [ref: Pentaho | Copy rows to result]

Tech-Spaghetti: Overall view of the PDI data looping process
Overall view of the PDI data looping process

Problem Statement / Scenario

Suppose you have an excel file which contain rows of employee names along with their details. Check the sample employee details as below image.

Capture1

Now the requirement is to create multiple (separate) Excel files for each employee along with their details.

Solution-A: Using “Copy rows to result”

This solution uses the inbuilt feature of pentaho “Copy rows to result”.

Step-1: Create a Job with two Transformation

Sample overall view of the final job.
Step-1: Sample overall view of the final job.
  • Transformation 1: Load Employees List into Memory
  • Transformation 2: Generate Output for every Employee

Step-2: Read contents to memory

Subscribe to continue reading

Subscribe to get access to the rest of this post and other subscriber-only content.

35 responses to “Using “Copy rows to result” in Pentaho Data Integration”

Create a website or blog at WordPress.com