Home Ask Login Register

Developers Planet

Your answer is one click away!

Rinku Buragohain February 2016

JOIN two data set on the basis of string matching condition in Pig

I am new in Pig and I have two data sets, "highspender" and "feedback".

Highspender:

Price,fname,lname
$50,Jack,Brown
$30,Rovin,Pall

Feedback:

date,Name,rate
2015-01-02,Jack B Brown,5
2015-01-02,Pall,4

Now I have to join these two datasets on the basis of their name. My condition should be fname or lname of Highspender should match with the Name of feedback. How to join these two datasets? Any idea?

Answers


Vikas Hardia February 2016

You can try below script to do the same all you need is to replace the names according to your data

highs = LOAD 'highs' using PigStorage(',') as (Price:chararray,fname:chararray,lname:chararray);
feedback = LOAD 'feeds' using PigStorage(',') as (date:chararray,Name:chararray,rate:chararray);
out = JOIN highs BY fname, feedback BY Name;
out1 = JOIN highs BY lname, feedback BY Name;
final_out = UNION out,out1;

For further help you can refer this Pig Reference manual

EDIT

As per the comment script for joining data with string function is as bellow:

highs = LOAD 'highs' using PigStorage(',') as (Price:chararray,fname:chararray,lname:chararray);
feedback = LOAD 'feeds' using PigStorage(',') as (date:chararray,Name:chararray,rate:chararray);
crossout = cross highs, feedback;
final_lname = filter crossout by ( REPLACE (feedback::Name,highs::lname ,'') != feedback::Name);
final_fname = filter crossout by ( REPLACE (feedback::Name,highs::fname ,'') != feedback::Name);
final = UNION final_lname, final_fname;

Post Status

Asked in February 2016
Viewed 1,470 times
Voted 13
Answered 1 times

Search




Leave an answer


Quote of the day: live life