Hadoop High Availability - Daemons overview

Discussed few concept which I came across setting up Hadoop Cluster High Availability

Role of StandBy Node

Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.

In order provide fast fail-over Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called "JournalNodes" (JNs). When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs.

The Standby node is capable of reading the edits from the JNs, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace.

In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

DataNode configuration in Hadoop HA

In order to provide a fast fail-over, it is also necessary that the Standby node has up-to-date information regarding the location of blocks in the cluster and status of each DataNode.

In order to achive this, all the DataNodes are configured with the location of both NameNodes(Active & StandBy), and they send block location information and heartbeats to both NameNodes.

umm.. What is Secondary NameNode
will explain from NameNode for more clarity

Namenode

Namenode holds the meta data for the HDFS like Namespace information, block information etc. When in use, all this information is stored in main memory. But these information also stored in disk for persistence storage.

fsimage -> Its the snapshot of the filesystem when namenode started

edit logs -> Its the sequence of changes made to the filesystem after NameNode started Only in the restart of NameNode, edit logs are applied to fsimage to get the latest snapshot of the file system. But NameNode restart are rare in production clusters which means edit logs can grow very large for the clusters where NameNode runs for a long period of time. The following issues we will encounter in this situation

Editlog become very large , which will be challenging to manage it
Namenode restart takes long time because lot of changes has to be merged
In the case of crash, we will lost huge amount of metadata since fsimage is very old So to overcome this issues we need a mechanism which will help us reduce the edit log size which is manageable and have up to date fsimage ,so that load on namenode reduces

Secondary NameNode
Secondary NameNode helps to overcome the above issues by taking over responsibility of merging editlogs with fsimage.

It gets the edit logs from the namenode in regular intervals and applies to fsimage (i.e build new image)
Once it has new fsimage, it copies back to namenode
Namenode will use this fsimage for the next restart,which will reduce the startup time

Why Secondary NameNode not needed in HA Hadoop cluster

In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error.

What is split-brain scenario in Hadoop HA

In HA cluster that only one of the NameNodes be active at a time and the Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called "split-brain scenario," the JournalNodes will only ever allow a single NameNode to be a writer at a time.
During a failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes. which will effectively prevent the other NameNode from continuing in the Active state.

Tech Insights

About Us

نموذج الاتصال