Inserting XML Node into a XML Source Data using Pentaho Data Integration | Handling Complex XML Structures

Published by

on


Inserting a new xml node into a complex XML data source will fail with the approach provided in my previous blog. This is because handling multiple source structure will fail in case it is having multiple parent-child relationship. The use “.“(dot) will also not work, since it will recurse through all the child node missing out multiple sub-parents/parent nodes.

First of all let us take a sample complex XML data source having multiple rootnodes and subnodes like below:

<Root>
	<Rootnode1>
		<Subnode1>
			<Node></Node>
			<Node></Node>
			<Node></Node>
		</Subnode1>
		<Subnode2>
			<Node></Node>
			<Node></Node>
			<Node></Node>
		</Subnode2>
	</Rootnode1>
	<Rootnode2>
		<Subnode>
			<Node></Node>
			<Node></Node>
			<Node></Node>
		</Subnode>
	</Rootnode2>
</Root>

And we want to achieve this as below:

<Root>
	<Rootnode1>
		<Subnode1>
			<Node><newField/></Node>
			<Node><newField/></Node>
			<Node><newField/></Node>
		</Subnode1>
		<Subnode2>
			<Node><newField/></Node>
			<Node><newField/></Node>
			<Node><newField/></Node>
		</Subnode2>
	</Rootnode1>
	<Rootnode2>
		<Subnode>
			<Node><newField/></Node>
			<Node><newField/></Node>
			<Node><newField/></Node>
		</Subnode>
	</Rootnode2>
</Root>

We would be using a new step in here : XML Input Stream (StAX) . This step specializes in reading complex xml structures using the StAX parser . But using this step will also add up to our development effort in creating the desired target xml file.

c1
Master Transformation for Inserting a new Node to a complex XML

So let us break the code into multiple steps in order to tackle this issue.

Idea

The idea is to read all the xml nodes using the XML Input Stream step having <Node> </Node>, insert a new node into these filtered out nodes and Finally join it with the Master Structure of the XML Source data.

How to do it:?

Follow the steps as below:

Step 1: Source Input (StAX)

Use this step to read the complex source xml. The step ‘XML Input Stream’ will try to read the xml and displays the information into multiple options. You just choose the following:

  • xml_data_type_description
  • xml_element_id
  • xml_parent_element_id
  • xml_data_name
  • xml_data_value

Rest of the options are not required as a part of this demo. For more, read the Pentaho Wiki.

Subscribe to continue reading

Subscribe to get access to the rest of this post and other subscriber-only content.

One response to “Inserting XML Node into a XML Source Data using Pentaho Data Integration | Handling Complex XML Structures”