Namenode editlog (dfs.name.dir) multiple storage directories
http://hadoop.apache.org/common/docs/current/hdfs-default.html
There were many comments on the post. See below the link for the complete post
http://lucene.472066.n3.nabble.com/what-will-happen-if-a-backup-name-node-folder-becomes-unaccessible-td1253293.html#a1253293
In this post I am basically going to summarize my tests to prove that it works in the cloudera distribution. So the behavior is that it ignores any directories that are inaccessible and the namenode only bails out when it can't access any of the specified directories. The below series of tests are pretty much self explanatory
hadoop@training-vm:~$ hadoop version Hadoop 0.20.1+152 Subversion -r c15291d10caa19c2355f437936c7678d537adf94 Compiled by root on Mon Nov 2 05:15:37 UTC 2009 hadoop@training-vm:~$ jps 8923 Jps 8548 JobTracker 8467 SecondaryNameNode 8250 NameNode 8357 DataNode 8642 TaskTracker hadoop@training-vm:~$ /usr/lib/hadoop/bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode hadoop@training-vm:~$ mkdir edit_log_dir1 hadoop@training-vm:~$ mkdir edit_log_dir2 hadoop@training-vm:~$ ls edit_log_dir1 edit_log_dir2 hadoop@training-vm:~$ ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name total 8 drwxr-xr-x 2 hadoop hadoop 4096 2009-10-15 16:17 image drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 15:56 current hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name edit_log_dir1 hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name edit_log_dir2 ------ hdfs-site.xml added new dirs <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <!-- specify this so that running 'hadoop namenode -format' formats the right dir --> <name>dfs.name.dir</name> <value>/var/lib/hadoop- 0.20/cache/hadoop/dfs/name,/home/hadoop/edit_log_dir1, /home/hadoop/edit_log_dir2</value> </property> <property> <name>fs.checkpoint.period</name> <value>600</value> </property> <property> <name>dfs.namenode.plugins</name> <value>org.apache.hadoop.thriftfs.NamenodePlugin</value> </property> <property> <name>dfs.datanode.plugins</name> <value>org.apache.hadoop.thriftfs.DatanodePlugin</value> </property> <property> <name>dfs.thrift.address</name> <value>0.0.0.0:9090</value> </property> </configuration> ---- start all daemons hadoop@training-vm:~$ /usr/lib/hadoop/bin/start-all.sh starting namenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-training-vm.out localhost: starting datanode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-training-vm.out localhost: starting secondarynamenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-training-vm.out starting jobtracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-training-vm.out localhost: starting tasktracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-training-vm.out -------- namenode log confirms all dirs taken 2010-08-24 16:20:48,718 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = training-vm/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.1+152 STARTUP_MSG: build = -r c15291d10caa19c2355f437936c7678d537adf94; compiled by 'root' on Mon Nov 2 05:15:37 UTC 2009 ************************************************************/ 2010-08-24 16:20:48,815 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=8022 2010-08-24 16:20:48,819 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:8022 2010-08-24 16:20:48,821 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2010-08-24 16:20:48,822 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext 2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false 2010-08-24 16:20:48,903 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext 2010-08-24 16:20:48,905 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/hadoop/edit_log_dir1 is not formatted. 2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... 2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/hadoop/edit_log_dir2 is not formatted. 2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ... 2010-08-24 16:20:48,938 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 41 2010-08-24 16:20:48,947 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0 2010-08-24 16:20:48,947 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 4357 loaded in 0 seconds. ---- directories confirm in use hadoop@training-vm:~$ ls -ltr edit_log_dir1 total 12 drwxr-xr-x 4 hadoop hadoop 4096 2010-08-24 16:01 name -rw-r--r-- 1 hadoop hadoop 0 2010-08-24 16:20 in_use.lock drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 image drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 current hadoop@training-vm:~$ ls -ltr edit_log_dir2 total 12 drwxr-xr-x 4 hadoop hadoop 4096 2010-08-24 16:01 name -rw-r--r-- 1 hadoop hadoop 0 2010-08-24 16:20 in_use.lock drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 image drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 current ----- secondary name node checkpoint worked fine ... 2010-08-24 16:27:10,756 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL localhost:50070putimage=1&port=50090&machine=127.0.0.1&token=- 18:1431678956:1255648991179:1282692430000:1282692049090 2010-08-24 16:27:11,008 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4461 .... --- dirctory put works fine hadoop@training-vm:~$ hadoop fs -ls /user/training Found 3 items drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/hdfs-site.xml /user/training hadoop@training-vm:~$ hadoop fs -ls /user/training Found 4 items drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output -rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output ------ delete one of the directories hadoop@training-vm:~$ rm -rf edit_log_dir2 hadoop@training-vm:~$ ls -ltr total 4 drwxr-xr-x 5 hadoop hadoop 4096 2010-08-24 16:20 edit_log_dir1 -- namenode logs No errors/warns in logs -------- namenode still running hadoop@training-vm:~$ jps 12426 NameNode 12647 SecondaryNameNode 12730 JobTracker 14090 Jps 12535 DataNode 12826 TaskTracker ---- puts and ls work fine hadoop@training-vm:~$ hadoop fs -ls /user/training Found 4 items drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output -rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/core-site.xml /user/training hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/mapred-site.xml /user/training hadoop@training-vm:~$ hadoop fs -ls /user/training Found 6 items -rw-r--r-- 1 hadoop supergroup 338 2010-08-24 16:28 /user/training/core-site.xml drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output -rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input -rw-r--r-- 1 hadoop supergroup 454 2010-08-24 16:29 /user/training/mapred-site.xml drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output ------- secondary namenode checkpoint is successdul 2010-08-24 16:37:11,455 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4671 .... 2010-08-24 16:47:11,884 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4671 ... 2010-08-24 16:57:12,264 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 4671 ------- after 30 mins hadoop@training-vm:~$ jps 12426 NameNode 12647 SecondaryNameNode 12730 JobTracker 16256 Jps 12535 DataNode 12826 TaskTracker