Showing posts from October, 2013

Apache Oozie 3.3.1 installation on Apache Hadoop 0.23.0

I have been trying to install apache oozie 3.3.1 on Hadoop 0.23.0. from last few days. 
The documentation provided in apache website is not very clear and very less documentation is provided for the hadoop with new (MRv2 / Yarn) architecture. So I hope my blog will help to some extent in configuring oozie 3.3.1 on Hadoop 0.23.0 

Here we go,

link to apache oozie quick start
and apache hadoop
My testing environment
 4 node cluster ( 1 master , 3slaves )Apache Hadoop 0.23.0Apache oozie 3.3.1 java 1.6.0_26Maven 3.0.4
Oozie server installation
Download oozie 3.3.1.tar.gz  from the nearest mirror site apache/oozie/3.3.1 ( i downloaded from mirror) Unpack the oozie-3.3.1 tar.gz file under some /home/srikanthThe following two properties are required in Hadoop core-site.xml:
<!-- OOZIE --> <property> <name>hadoop.proxyuser.[OOZIE_SERVER_USER].hosts</name> &l…

50 Top Open Source Tools for Big Data

Big Data Analysis Platforms and Tools

1. Hadoop

You simply can't talk about big data without mentioning Hadoop. The Apache distributed data processing software is so pervasive that often the terms "Hadoop" and "big data" are used synonymously. The Apache Foundation also sponsors a number of related projects that extend the capabilities of Hadoop, and many of them are mentioned below. In addition, numerous vendors offer supported versions of Hadoop and related technologies. Operating System: Windows, Linux, OS X.

2. MapReduce

Originally developed by Google, the MapReduce website describe it as "a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes." It's used by Hadoop, as well as many other data processing applications. Operating System: OS Independent.

3. GridGain

GridGrain offers an alternative to Hadoop's MapReduce that is compatible with the…