Following are some of the open source solutions for processing big data.
Hadoop : Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes these sub-projects
Hadoop ecosystem consists.
HDFS - Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations.
Map Reduce – MapReduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers.
Pig – Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
Hive – Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.
Hbase – HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
Voldemart - Voldemort is a distributed key-value storage system
Cassandra -The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo’s fully distributeddesign and Bigtable’s ColumnFamily-based data model.
Previous Post Next Post