Showing posts from August, 2013

MySQL Applier for Hadoop

The MySQL Applier for Hadoop enables the real-time replication of events from MySQL to Hive / HDFS. This video tutorial demonstrates how to install, configure and use the Hadoop Applie

Video Tutorial :

CENTOS 6 - XEN installtion

Install Xen 4 with Libvirt / XL on CentOS 6 (2013)
Update: Xen is now part of CentOS 6, as part of the Xen4CentOS6 project.
It can be installed on your CentOS 6 machine via running the following commands:
yum install centos-release-xen && yum install xen libvirt python-virtinst libvirt-daemon-xen sh /usr/bin/ reboot The above commands will install the official Xen 4 packages along with the libvirt toolstack, load the correct kernel into your GRUB boot-loader, and reboot into your Xen kernel.
Once your system boots, ensure that you are running the Xen 4 Kernel via:
uname -r Now that Xen 4 has been installed, you can skip to section 6 at the bottom of this guide for installing your first Virtual Machine (VM) on CentOS Xen.

This article will guide you through the successful installation of the latest Xen on CentOS 6.x.
First things first, update your CentOS install via the following command:
yum -y update 1. Disable SElinuxSElinux can really interfere with X…

How Scaling Really Works in Apache HBase

This post was originally published via, we republish it here in a slightly modified form for your convenience:
At first glance, the Apache HBase architecture appears to follow a master/slave model where the master receives all the requests but the real work is done by the slaves. This is not actually the case, and in this article I will describe what tasks are in fact handled by the master and the slaves.
Regions and Region Servers HBase is the Hadoop storage manager that provides low-latency random reads and writes on top of HDFS, and it can handle petabytes of data. One of the interesting capabilities in HBase is auto-sharding, which simply means that tables are dynamically distributed by the system when they become too large.
The basic unit of horizontal scalability in HBase is called a Region. Regions are a subset of the table’s data and they are essentially a contiguous, sorted range of rows that are stored together.
Initially, there is only one region…

Oracle Big Data Connectors 2.1

Oracle Big Data Connectors 2.1
Oracle Big Data Connectors 2.1 is now available.  
Oracle Loader for Hadoop and Oracle SQL Connector for HDFS add certification with CDH 4.2 and Apache Hadoop 1.1.1 in this release.
Enhancements to Oracle Loader for Hadoop: 
 - Ability to load from Hive partitioned tables
 - Improved usability and error handling
 - Sort by user-specified key before load

Hadoop Default Ports Quick Reference

Define your choice of ports by setting properties dfs.http.address for Namenode and mapred.job.tracker.http.address for Jobtracker in conf/core-site.xml:
<configuration> <property> <name>dfs.http.address</name> <value>50070</value> </property> <property> <name>mapred.job.tracker.http.address</name> <value>50030</value> </property> </configuration>
Web UIs for the Common User The default Hadoop ports are as follows:

DaemonDefault PortConfiguration ParameterHDFSNamenode50070dfs.http.addressDatanodes50075dfs.datanode.http.addressSecondarynamenode50090dfs.secondary.http.addressBackup/Checkpoint node?50105dfs.backup.http.addressMRJobracker50030mapred.job.tracker.http.addressTasktrackers50060mapred.task.tracker.http.address? Replaces secondarynamenode in 0.21. Hadoop daemons expose some information over HTTP. All Hadoop daemons expose the following: