What Hive Does
Hadoop was built to organize and store massive amounts of data. A Hadoop cluster is a reservoir of heterogeneous data, from multiple sources and in different formats. Hive allows the user to explore and structure that data, analyze it, and then turn it into business insight.How Hive Works
The tables in Hive are similar to tables in a relational database, and data units are organized in a taxonomy from larger to more granular units. Databases are comprised of tables, which are made up of partitions. Data can be accessed via a simple query language, called HiveQL, which is similar to SQL. Hive supports overwriting or appending data, but not updates and deletes.Within a particular database, data in the tables is serialized and each table has a corresponding Hadoop Distributed File System (HDFS) directory. Each table can be sub-divided into partitions that determine how data is distributed within sub-directories of the table directory. Data within partitions can be further broken down into buckets.
Hive supports primitive data formats such as TIMESTAMP, STRING, FLOAT, BOOLEAN, DECIMAL, BINARY, DOUBLE, INT, TINYINT, SMALLINT and BIGINT. In addition, primitive data types can be combined to form complex data types, such as structs, maps and arrays.
Here are some advantageous characteristics of Hive:
- Familiar Hundreds of unique users can simultaneously query the data using a language familiar to SQL users.
- Fast Response times are typically much faster than other types of queries on the same type of huge datasets.
- Scalable and extensible As data variety and volume grows, more commodity machines can be added to the cluster, without a corresponding reduction in performance.
- Informative Familiar JDBC and ODBC drivers allow many applications to pull Hive data for seamless reporting. Hive allows users to read data in arbitrary formats, using SerDes and Input/Output formats.
HIVE Installation
Before installing Hive, ensure that Hadoop is installed in any of the modes
Download hive-0.9.0 from http://www.apache.org/dyn/closer.cgi/hive/
Extract the tar file and save the hadoop directory in my example i am using /data directory
Once you unpack Hive set the HIVE_HOME environment variable.
Run:
$ export HIVE_HOME=/data/Hive
Now that Hadoop and Hive are both installed and running you need to create directories for the Hive metastore and set their permissions.
Run:
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
if u already set the environment varible directly give
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse
Test your Hive install.
Run:
$ hive
hive> show tables;
OK
Time taken: 6.374 seconds