In Pentaho DI (Kettle), Hops denote the direction to which the data will flow. Check the below image:

“Table input” is the source and “Table output” is the target. The way to identify the direction of data movement is through the black straight line with an arrow in middle. This is the Hop.
In Pentaho, the data movement can be done in two ways ; namely “Copy Data” and “Distribute Data“. The options are available in PDI by right clicking on any of the Kettle Steps and selecting the Data Movement Option to choose either. Check the image here.
The difference is :

The best way to explain it is via example. So let us consider a sample ktr having one input and two outputs. check the image below :

Here the input step is the “Data Grid” and the output steps are text file outputs namely “Output 1” and “Output 2”. Note i have disabled the hop for demonstration purpose. Now in order to understand the data movement in kettle, we would be selecting each of the types and analyze the Step Metrics and the output.
Input Data Set (in the Data Grid) would be a simple one column named “name” having 3 records.

Demo-1: Copy Data
We take two output files (as shown in the 3rd image). Select “Copy data to next step” in the “Data movement” step.
You will find the hops starts showing copy symbol on the hops. like the image here.
Now let us run the transformation and then analyse the step metrics.

Here we see, copy rows is making the 3 input rows from the Data Grid to be inserted into the Ouput1 and Output2. Both the outputs get all the data present in the Input Step.
Demo-2: Distribute Data
Subscribe to continue reading
Subscribe to get access to the rest of this post and other subscriber-only content.


11 responses to ““Copy Data” and “Distribute Data” in Pentaho Data Integration (Kettle)”