Category: Big Data
-

Web Scrapping, the challenges
Web scrapping is a technique of gathering information and data out of a website. Even though copying something manually from a website is consider web-scrapping;…
-

JDBC Connection to Impala
Impala, a fast open-source MPP database for Apache Hadoop, and supported by Cloudera, offers JDBC connection for building applications through its JDBC library. This blog…
-

Connect to Mark Logic database using Pentaho DI
Mark Logic is a NoSQL database that allows third party tools to connect using REST Api. This blog aims at providing explanation on connecting to…
-

Creating REST-API in Mark Logic database
Today I will try to explain how to create REST-API in Mark Logic database. Creating a REST Api will allow the client applications like Pentaho,…
-

Loading Data from S3 to Redshift | Pentaho Data Integration
The blog details steps for loading data from Amazon S3 to Redshift using PDI. The process includes creating a table in the Redshift cluster, executing…
-

Loading Data to AWS S3 Bucket | Pentaho Data Integration
Loading large volumes of data into Amazon Redshift using Pentaho may initially present performance issues due to Redshift treating each data row as a separate…
-

Open Source BI Stack
The use of data by people and business around the world is on a rise. Almost everyone involved into work are now-a-days looking for a…
-

Setting up Amazon Redshift Cluster and accessing using Pentaho Kettle
Amazon Redshift is a fully managed and highly scalable data-warehouse service in the cloud. You can start from few hundred GB of data and scale…
-

Stream Data from Twitter API with OAuth using Kettle
Streaming data from Twitter Api is really important from the data analytic perspective. Getting the pulse of your user community on the web and across…


