Hadoop for Beginner: Create your first apache pig script

As is the case with scripts in other programming languages such as SQL, Unix Shell, etc., Pig scripts are used to execute a set of Apache Pig commands collectively. This helps in reducing the time and effort invested in writing and executing each command manually while doing the Pig programming. This blog (Pig Programming: Create Your First Apache Pig script) is a step by step guide to help you create your first Apache Pig script.

Pig Programming: Create Your First Apache Pig Script

An Apache Pig script works in two modes:

Local Mode: In ‘local mode’, you can execute the pig script in local file system. In this case you don’t need to store the data in Hadoop HDFS file system, instead you can work with the data stored in local file system itself.
HDFS Mode: In ‘HDFS mode’, the data needs to be stored in HDFS file system and you can process the data with the help of pig script.

Pig Script in HDFS Mode:

Step1: Writing a script

Open an editor (e.g. gedit) in your Cloudera Demo VM environment:
Command: gedit sample.pig

This command will create a ‘sample.pig’ file inside the home directory of cloudera user.

Let’s write few PIG commands in the sample script.
Let us say our task is to read data from a data file and to display the required contents on the as output.
The sample data file contains following data:
Shabbir           Khan 9314573259 Bangalore Engineer
Manish            Sharma 8882148796 Gurgaon Lecturer
Mahesh           Kumar 8521548932 Noida Business
Sampath         Reddy 8547987412 Hyderabad Engineer
Mohan Reddy 9256458798 Hyderabad Professor
Save the text file with the name ‘information.txt’
Sample Pig data file

The sample data file contains five columns FirstName, LastName, MobileNo, City, and Profession separated by tab key. Our task is to read the content of this file in to HDFS and display First Name, Mobile Number and Profession of these contacts.
To process this data using Pig, this file should be present in Apache Hadoop HDFS.
Use the following command:
Command: hadoop dfs –copyFromLocal information.txt hdfs:/
Command to read the content of Pig file into HDFS

Command to read the content of Pig file into HDFS

Edit the Pig script (sample.pig) to include following commands:
A = LOAD ‘/information.txt’ using PigStorage (‘\t’) as (FName: chararray, LName: chararray, MobileNo: chararray, City: chararray, Profession: chararray);
B = FOREACH A generate FName, MobileNo, Profession;
DUMP B;
Command to load the data

Save and close the file.
The first command loads the file ‘information.txt’ into variable A with indirect schema (FName, LName, MobileNo, City, Profession).
The second command loads the required data from variable A to variable B.
The third line displays the content of variable B on the terminal/console.
Step 2: Execute the Pig Script
To execute the pig script in HDFS mode, run the following command:
Command: pig sample.pig
Command to execute the pig script in HDFS mode

Command to execute the pig script in HDFS mode

Review the result.
Pig script result review

Congratulations on executing your first Pig script successfully!

Hadoop for Beginner

HTML/JavaScript

Create your first apache pig script

Pig Programming: Create Your First Apache Pig Script

Pig Script in HDFS Mode:

Step1: Writing a script

No comments:

Post a Comment

HTML/JavaScript

document.write(ssyby);

Create your first apache pig script

Pig Programming: Create Your First Apache Pig Script

Pig Script in HDFS Mode:

Step1: Writing a script

No comments:

Post a Comment