Posts

Showing posts from April, 2017

Choosing Kerberos approach for Hadoop cluster in an enterprise environment

Factors to consider before choosing an approach for Kerberos implementation within an enterprise.
Article Choosing an approach for Kerberos implementation on Hadoop cluster is critical from a long term maintenance point. Enterprises have their own security policies and guidelines and a successful kerberos implementation needs to adhere to enterprise security architecture. There are multiple guides available on how to implement Kerberos but I couldn't find information on which approach to choose and Pros and Cons associated with each approach.
In a Hortonworks Hadoop cluster, there are 3 different ways of generating keytabs and principals and managing them.
a. Use an MIT KDC specific to Hadoop cluster - automated keytab management using Ambari
KDC specific to Hadoop cluster can be installed and maintained on one of the Hadoop nodes. All users/keytabs required for kerberos implementation are automatically managed using Ambari.
Pros…

Default mapred.tasktracker.map.tasks.maximum and Increasing io.sort.mb

mapred.tasktracker.map.tasks.maximum 2 mapred.tasktracker.reduce.tasks.maximum 2 If you want to change them, then you should change the file {$HADOOP_HOME}/conf/mapred-site.xml, where ${HADOOP_HOME} is the path of hadoop. For example, if you determine that you want 8 reducers (this can be done by setting conf.setNumReduceTasks(8); in your code) and you keep these default values, assuming that you have 2 nodes in the cluster, each node will run 2 map tasks at the beginning, so, in overall, 2x2 = 4 map tasks will be running in your cluster. When any of these map tasks finishes, the node will run the next map task in the queue. At any point, 4 map tasks (maximum) will be running in your cluster. EDIT: I found the mistake. In the first link it says: The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum). It should be: The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.reduce.tasks.maximum).
Increasing…