What is Hadoop ?

Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets.

Hadoop - Overview

•  Hadoop includes:
    –  Distributed File System - distributes data
    –  Map/Reduce - distributes application
•  Open source from Apache
•  Written in Java
•  Runs on
    –  Linux, Mac OS/X, Windows, and Solaris
    –  Commodity hardware

  Hadoop Distributed File System

•  Designed to store large files
•  Stores files as large blocks (64 to 128 MB)
•  Each block stored on multiple servers
•  Data is automatically re-replicated on need

Previous Post Next Post