Hive Custom UDFs – XML Parser (Plugin Introduction)

Published by

on


Hive Custom UDF Project Introduction

This blog introduces a new plugin release – XMLParser as a part of the Hive Custom UDF Project. The details of this plugin is given below.

Description

XMLParser is the custom user defined function that would parse a xml data. A xml is a markup language and one of the most popular data sharing format.

In order to read data from an xml file/data, one has an option of using XPath. Xpath is a query language for xml that extracts the information out of an xml file.

XMLParser takes XML data and XPath query as an input to parse the data and generate the output.

Example of XML and XPath:

<?xml version = "1.0"?>
<epl>
  <player id="1">
     <name>Harry Kane</name>
     <Age>27</Age>
     <club>TOT</club>
  </player>
  <player rollno = "2">
     <name>Bruno Fernandes</name>
     <Age>28</Age>
     <club>MUN</club>
  </player>
</epl>
  • In order to get the list of players in a epl xml tag, use the Xpath query as /epl/player/name/text(). This will result in the list of players as: ["Harry Kane","Bruno Fernandes"]
  • In order to get the list of players in club MUN, you can use the Xpath query as /epl/player[club='MUN']/text(). This will output the result as : Bruno Fernandes

Usage

_FUNC_(xml_data:String, xpath:String)
  • @Input Params:
    • Xml_data of type String
    • Xpath of type string.
  • @Output: An Array List of the result from the Xpath.

e.g.:

  • xmlparser("<epl><player id="1"><name>Harry Kane</name></player><player id="2"><name>Bruno Fernandes</name></player></epl>","/epl/player/@/*")
  • xmlparser(data,"/epl/player/@/*") where data is the field having the xml format data.

Download

Subscribe to continue reading

Subscribe to get access to the rest of this post and other subscriber-only content.

One response to “Hive Custom UDFs – XML Parser (Plugin Introduction)”