Ankur Singh February 2016

Separate each file in a different bag from the folder

I am using PIG LATIN for the analysis of previous T20 WORLD CUP matches. So every match is in a separate CSV file. And I want to find the total no of 100's made by all players.

My approach : I load the each file using this script:

t20 = LOAD '/home/ankur/Desktop/Pig_Scripts/t20_csv' USING PigStorage(',') as (inning,overs,team,stk,nstk,bowler,run,extra,type,name);

but using this approach each file's data come into the same bag and that's why I can't find the no of 100's.

  • If any how each file come into different bag then I can using FOREACH . I can calculate.

Is my way of thing is correct or not? Suggest me if you have other idea.

Answers


inquisitive_mind February 2016

  1. Load all files
  2. Filter records where 'run' > 99
  3. Count Filtered records

    t20 = LOAD '/home/ankur/Desktop/Pig_Scripts/*' USING PigStorage(',') as (inning,overs,team,stk,nstk,bowler,run,extra,type,name); hundred_records = FILTER t20 BY (run > 99); total_hundreds = FOREACH hundred_records GENERATE COUNT(hundred_records);

Post Status

Asked in February 2016
Viewed 2,240 times
Voted 5
Answered 1 times

Search




Leave an answer