Hive Custom UDFs – Project Introduction

Apache Hive, Apache Impala, Big Data, Data & Analytics, Database, Java, Plugins, Programming

Hive Custom UDFs – Project Introduction

Published by

Rishu Shrivastava

on

August 29, 2020

Apache Hive is a big data database that facilitates reading, writing, and managing large datasets residing in the distributed storage and queried using SQL syntax. Built on top of Apache Hadoop, hive enables easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.

Project concept

Apache Hive supports many in-built functions to manipulate and process the data. Though there are lot of available options, sometimes due to business use-cases, readily available functions may not be available. Hive allows you to extend and create User defined functions (UDFs) by extending the org.apache.hadoop.hive.ql.exec.UDF class.

The idea is to enhance the in-built functions available in Apache Hive and build new ones which could be added on. In this project, we will take some of the work around solutions in hive for some of the business use-cases and try to solve it by building custom hive UDFs.

Custom UDF List

In the first version of this project, we are releasing two of the custom udfs.

UDF-1.0: Find total occurrence of a word/character in a sentence

This custom UDF counts the total number of matching words in a sentence. It is useful particularly if you are trying to quickly filter out the number of words in a sentence or database columns.

For e.g. if you are trying to search for number of # (hash-tags) in a hive column of tweets, you can use this function to get you the total hash-tag counts

Documentation and Usage link for UDF1.0

Tech Spaghetti

Hive Custom UDFs – Project Introduction

Project concept

Custom UDF List

UDF-1.0: Find total occurrence of a word/character in a sentence

UDF-2.0: Find total days minus the weekends between two dates

One response to “Hive Custom UDFs – Project Introduction”

Hive Custom UDFs – Project Introduction

Project concept

Custom UDF List

UDF-1.0: Find total occurrence of a word/character in a sentence

UDF-2.0: Find total days minus the weekends between two dates

Subscribe to continue reading

Share this:

One response to “Hive Custom UDFs – Project Introduction”