Posts

Showing posts from June, 2013

Nutch Tutorial

Follow these 10 steps to setup Nutch & crawl your site to create your own Web DB

In case of any queries drop me an email at mail.swapnilk@gmail.com

Have fun!!

Step 1:
Download latest binaries from here:
http://www.apache.org/dyn/closer.cgi/nutch/

Step 2:
Make required directories
sudo mkdir /usr/local/nutch
sudo mkdir /usr/local/nutch/framework
sudo mkdir /usr/local/nutch/dist


Step 3:
Copy to dist
sudo cp apache-nutch-1.4-bin.tar.gz /usr/local/nutch/dist/

Step 4:
Unpack
sudo tar -xvzf apache-nutch-1.4-bin.tar.gz -C /usr/local/nutch/framework/

Step 5:
Make executable
sudo chmod +x /usr/local/nutch/framework/apache-nutch-1.4-bin/runtime/local/bin/nutch

Step 6:
Make seed url file
sudo mkdir -p /usr/local/nutch/framework/apache-nutch-1.4-bin/runtime/local/bin/urls
sudo gedit /usr/local/nutch/framework/apache-nutch-1.4-bin/runtime/local/bin/urls/nutch


Add following to nutch.txt
http://www.usc.edu/

Step 7:
Add Agent
sudo gedit /usr/local/nutch/framework/apache-nutch-1.4-bin/runtime/local/conf/nutch-site.xml

Add th…

Hive: Import and Export data from HDFS and Local Directory

Import Data from Local Directory:
hive>LOAD DATA LOCAL INPATH '/local/path' OVERWRITE INTO TABLE table-name;
OVERWRITE is optional in latest versions of hive . You can exclude it if you want to  append data in table.

Export Data to Local Directory:
hive>INSERT OVERWRITE LOCAL DIRECTORY '/local dir/path' SELECT * from table-name;

Import Data from HDFS  :
hive>LOAD DATA INPATH '/hdfs/path/to/file' OVERWRITE INTO TABLE tablename;
OVERWRITE is optional in latest versions of hive . You can exclude it if you want to  append data in table.

Export Data to HDFS:

hive>INSERT OVERWRITE DIRECTORY  /path/to/hdfs' SELECT * FROM tablename;

String Functions in Hive

String Functions in Hive The string functions in Hive are listed below:

ASCII( string str )

The ASCII function converts the first character of the string into its numeric ascii value.
Example1: ASCII('hadoop') returns 104
Example2: ASCII('A') returns 65

CONCAT( string str1, string str2... )

The CONCAT function concatenates all the stings.
Example: CONCAT('hadoop','-','hive') returns 'hadoop-hive'

CONCAT_WS( string delimiter, string str1, string str2... )

The CONCAT_WS function is similar to the CONCAT function. Here you can also provide the delimiter, which can be used in between the strings to concat.
Example: CONCAT_WS('-','hadoop','hive') returns 'hadoop-hive'

FIND_IN_SET( string search_string, string source_string_list )

The FIND_IN_SET function searches for the search string in the source_string_list and returns the position of the first occurrence in the source string list. Here the source string l…