Tech Spaghetti

Category: Big Data

Apache Hive Apache Impala Big Data Data & Analytics Database Java Plugins Programming

Hive Custom UDFs – XML Parser (Plugin Introduction)

Hive Custom UDF Project Introduction This blog introduces a new plugin release – XMLParser as a part of the Hive Custom UDF Project. The details…

October 4, 2020
Apache Hive Apache Impala Big Data Data & Analytics Database Java Plugins Programming

Hive Custom UDFs – Project Introduction

Apache Hive is a big data database that facilitates reading, writing, and managing large datasets residing in the distributed storage and queried using SQL syntax. Built…

August 29, 2020
Big Data Data & Analytics Programming Python

Web Scrapping, the challenges

Web scrapping is a technique of gathering information and data out of a website. Even though copying something manually from a website is consider web-scrapping;…

May 9, 2020
Big Data Data & Analytics Java Programming

Extracting Tables from PDF

July 6, 2019
Apache Impala Big Data Data & Analytics Database Java Programming

JDBC Connection to Impala

Impala, a fast open-source MPP database for Apache Hadoop, and supported by Cloudera, offers JDBC connection for building applications through its JDBC library. This blog…

January 27, 2019
Big Data Data & Analytics Database Mark Logic NoSQL Pentaho Pentaho Data Integration & Analytics

Connect to Mark Logic database using Pentaho DI

Mark Logic is a NoSQL database that allows third party tools to connect using REST Api. This blog aims at providing explanation on connecting to…

July 15, 2017
Big Data Data & Analytics Database Mark Logic NoSQL

Creating REST-API in Mark Logic database

Today I will try to explain how to create REST-API in Mark Logic database. Creating a REST Api will allow the client applications like Pentaho,…

July 2, 2017
AWS Big Data Cloud Data & Analytics Pentaho Pentaho Data Integration & Analytics

Loading Data from S3 to Redshift | Pentaho Data Integration

The blog details steps for loading data from Amazon S3 to Redshift using PDI. The process includes creating a table in the Redshift cluster, executing…

December 11, 2015
AWS Big Data Cloud Data & Analytics Pentaho Pentaho Data Integration & Analytics

Loading Data to AWS S3 Bucket | Pentaho Data Integration

Loading large volumes of data into Amazon Redshift using Pentaho may initially present performance issues due to Redshift treating each data row as a separate…

November 30, 2015
Big Data Data & Analytics Pentaho Pentaho Data Integration & Analytics

Open Source BI Stack

The use of data by people and business around the world is on a rise. Almost everyone involved into work are now-a-days looking for a…

November 28, 2015
AWS Big Data Cloud Data & Analytics Pentaho Pentaho Data Integration & Analytics

Setting up Amazon Redshift Cluster and accessing using Pentaho Kettle

Amazon Redshift is a fully managed and highly scalable data-warehouse service in the cloud. You can start from few hundred GB of data and scale…

September 10, 2015
Big Data Data & Analytics Pentaho Pentaho Data Integration & Analytics

Stream Data from Twitter API with OAuth using Kettle

Streaming data from Twitter Api is really important from the data analytic perspective. Getting the pulse of your user community on the web and across…

September 6, 2015