A tool used to transfer data between Hadoop and relational databases. Sqoop reads the column information from the database and generates Java classes that represent that data for you to use in Map/Reduce jobs.

Prerequisites
  • Gnu/Linux
  • Java 1.6.x (preferred)
  • SSH
Download the latest version of Sqoop for the version of Hadoop you downloaded.

http://www.apache.org/dyn/closer.cgi/sqoop/

Also, download the JDBC drivers for your database. You will need these later.


Download the mysql j connector jar and store in lib directory present in sqoop home folder.

Just test your installation by typing

$ sqoop help

List the MySQL Databases in SQOOP,

> bin/sqoop list-databases --connect http://localhost/ --username dbusername --password "" 
 

 List the MySQL Tables in SQOOP,

> bin/sqoop list-tables --connect http://localhost/databasename --username dbusername --password ""  
 SQOOP Import Examples

Import the data (MySQL table) to HDFS 

Import MySQL table into HDFS
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName

Import the data (MySQL table) to Hive

Import MySQL table into Hive
bin/sqoop-import  --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName  --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home

Import the data (MySQL table) to HBase:

$ bin/sqoop import --connect jdbc:mysql://localhost/db1 --username root --password root --table tableName --hbase-table hbase_tableName  --column-family hbase_table_col1 --hbase-create-table

 There are many useful options for configuring how Sqoop imports data. Sqoop can import
data as Avro or Sequence files using the --as-avrodatafile and --as-sequencefile
arguments respectively. The data can be compressed while being imported as well using
the -z or --compress arguments. The default codec is GZIP, but any Hadoop compression
codec can be used by supplying the --compression-codec <CODEC> argument. See the
Compressing data using LZO recipe in Chapter 2, HDFS. Another useful option is --direct.
This argument instructs Sqoop to use native import/export tools if they are supported by the
configured database. In the preceding example, if --direct was added as an argument,
Sqoop would use mysqldump for fast exporting of the weblogs table. The --direct
argument is so important that in the preceding example, a warning message was logged
as follows:
WARN manager.MySQLManager: It looks like you are importing from mysql.
WARN manager.MySQLManager: This transfer can be faster! Use the --direct
WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
Previous Post Next Post