Default mapred.tasktracker.map.tasks.maximum and Increasing io.sort.mb




mapred.tasktracker.map.tasks.maximum         2  
mapred.tasktracker.reduce.tasks.maximum      2
 
If you want to change them, then you should change the file {$HADOOP_HOME}/conf/mapred-site.xml, where ${HADOOP_HOME} is the path of hadoop.
For example, if you determine that you want 8 reducers (this can be done by setting conf.setNumReduceTasks(8); in your code) and you keep these default values, assuming that you have 2 nodes in the cluster, each node will run 2 map tasks at the beginning, so, in overall, 2x2 = 4 map tasks will be running in your cluster. When any of these map tasks finishes, the node will run the next map task in the queue. At any point, 4 map tasks (maximum) will be running in your cluster.
EDIT: I found the mistake. In the first link it says:
The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.tasks.maximum).
It should be:
The right number of reduces seems to be 0.95 or 1.75 * (nodes * mapred.tasktracker.reduce.tasks.maximum).

Increasing io.sort.mb

1 down vote

according to the article here io.sort.mb should be 10 * io.sort.factor incase you have ram.
"core-site.xml"
<property>
<name>io.sort.factor</name>
<value>100</value>
<description>More streams merged at once while sorting files.</description>
</property>  

<property>
<name>io.sort.mb</name>
<value>200</value>
<description>Higher memory-limit while sorting data.</description>
</property>

Comments

  1. Great articles, first of all Thanks for writing such lovely Post! Earlier I thought that posts are the only most important thing on any blog. But here a Shout me loud found how important other elements are for your blog.Keep update more posts..

    Web Designing Training in Chennai

    Java Training in Chennai

    Salesforce Training in Chennai

    ReplyDelete

Post a Comment

Popular posts from this blog

Hive Indexing

HIVE Sorting and Join

Sqoop with Postgresql