Let us suppose, we have a XML data source as below:
<Rootnode>
<Node></Node>
<Node></Node>
<Node></Node>
</Rootnode>
Now if we want to insert a new XML Node in between the <Node></Node> Tag; something like as below:
<Rootnode>
<Node><newField/></Node>
<Node><newField/></Node>
<Node><newField/></Node>
</Rootnode>
Here <newField/> is the new xml node, which i would like to insert in between the <Node>.
Pentaho DI (kettle) provides few steps and sample examples to deal with XML data source. Steps like Get Data from XML, Add XML, XML Join will be used to achieve the above result. So let start by first of showing the entire transformation i have done to achieve this:

Follow the Steps below:
Step-1: Get Data from XML
Take two “Get Data from XML” step having the same source data. In the First Step, simply fetch the <Rootnode> structure using the XPath as : //*
In the Second Step: We would require to read all the Nodes inside the Rootnode. You can achieve reading all the nodes by using the recursive XPath which is nothing but the use of “.“(dot). Check the image below:

This will ensure that all the Nodes are read in a recursive fashion, which is required since we want to enter the new node into each of the <Node>.
Step-2: Add a constant
In order to define the new node, i have used Add Constant step to define a new node. Just define a fieldname and place value as newField, or the name of the node which you are going to use.
Step-3: Add XML
As per Pentaho Wiki : “The XML column step allows you to encode the content of a number of fields in a row in XML. This XML is added to the row in the form of a String field.“

In the Field’s section of this step, add the newField to the XML node having the Root XML element as “Node”. This is because we want to add the new node in the <Node> tag.
Subscribe to continue reading
Subscribe to get access to the rest of this post and other subscriber-only content.


One response to “Inserting XML Node into a XML Source Data using Pentaho Data Integration”