Dhritiman Das February 2016

How to filter out alphanumeric values using regular expression in pig

Pig Code

relation2 =  filter relation1 by column1 not matches '.*[a-z0-9.*].*'

hive logic

column1 not like '%[a-z0-9]%'. 

I want to implement the same logic in pig.

Answers


Surender Raja February 2016

I think you don't need alpha numeric records.

you look for records that have either alphabets or numbers.

Could you try this :

Input :

(123AET)
(123)
(AET)
(236MET)

So your expected output is

 (123)
 (AET)

Pig Script : This script is a generic script from which you can keep alphabets alone or number alone or alphanumeric alone or both for further processing

records = LOAD '/home/dir/alphanumeric.txt' USING PigStorage(',') AS(c1:chararray);

records_each = FOREACH records GENERATE c1,  (REGEX_EXTRACT(c1,'(^[a-zA-Z]+$)',1) is not null ? 'ALPHABETS' : (REGEX_EXTRACT(c1,'(^[0-9]+$)',1) is not null ? 'NUMBERS' : 'ALPHANUMERICS')) as c1_type;

records_filter = filter records_each by c1_type in( 'ALPHABETS','NUMBERS');

records_output = foreach records_filter generate c1;

dump records_output;

Post Status

Asked in February 2016
Viewed 2,612 times
Voted 6
Answered 1 times

Search




Leave an answer