A tool used to transfer data between Hadoop and relational databases. Sqoop
reads the column information from the database and generates Java
classes that represent that data for you to use in Map/Reduce jobs.
Prerequisites
http://www.apache.org/dyn/closer.cgi/sqoop/
Download the mysql j connector jar and store in lib directory present in sqoop home folder.
Just test your installation by typing
$ sqoop help
SQOOP Import Examples
Prerequisites
- Gnu/Linux
- Java 1.6.x (preferred)
- SSH
http://www.apache.org/dyn/closer.cgi/sqoop/
Also, download the JDBC drivers for your database. You will need these later.
Download the mysql j connector jar and store in lib directory present in sqoop home folder.
Just test your installation by typing
$ sqoop help
List the MySQL Databases in SQOOP,
> bin/sqoop list-databases --connect http://localhost/ --username dbusername --password ""
List the MySQL Tables in SQOOP,
> bin/sqoop list-tables --connect http://localhost/databasename --username dbusername --password ""
Import
the data (MySQL table) to HDFS
Import MySQL
table into HDFS
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName
$ bin/sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /user/ankit/tableName
Import
the data (MySQL table) to Hive
Import MySQL table into Hive
Import MySQL table into Hive
bin/sqoop-import --connect
jdbc:mysql://localhost:3306/db1 -username root -password password
--table tableName --hive-table tableName --create-hive-table
--hive-import --hive-home path/to/hive_home
Import
the data (MySQL table) to HBase:
$ bin/sqoop import --connect
jdbc:mysql://localhost/db1 --username root --password root --table
tableName --hbase-table hbase_tableName --column-family
hbase_table_col1 --hbase-create-table
There are many useful options for configuring how Sqoop imports data. Sqoop can import
data as Avro or Sequence files using the --as-avrodatafile and --as-sequencefile
arguments respectively. The data can be compressed while being imported as well using
the -z or --compress arguments. The default codec is GZIP, but any Hadoop compression
codec can be used by supplying the --compression-codec <CODEC> argument. See the
Compressing data using LZO recipe in Chapter 2, HDFS. Another useful option is --direct.
This argument instructs Sqoop to use native import/export tools if they are supported by the
configured database. In the preceding example, if --direct was added as an argument,
Sqoop would use mysqldump for fast exporting of the weblogs table. The --direct
argument is so important that in the preceding example, a warning message was logged
as follows:
WARN manager.MySQLManager: It looks like you are importing from mysql.
WARN manager.MySQLManager: This transfer can be faster! Use the --direct
WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
data as Avro or Sequence files using the --as-avrodatafile and --as-sequencefile
arguments respectively. The data can be compressed while being imported as well using
the -z or --compress arguments. The default codec is GZIP, but any Hadoop compression
codec can be used by supplying the --compression-codec <CODEC> argument. See the
Compressing data using LZO recipe in Chapter 2, HDFS. Another useful option is --direct.
This argument instructs Sqoop to use native import/export tools if they are supported by the
configured database. In the preceding example, if --direct was added as an argument,
Sqoop would use mysqldump for fast exporting of the weblogs table. The --direct
argument is so important that in the preceding example, a warning message was logged
as follows:
WARN manager.MySQLManager: It looks like you are importing from mysql.
WARN manager.MySQLManager: This transfer can be faster! Use the --direct
WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.