# Developers Planet

Your answer is one click away!

amchang87 February 2016

### Pig: How do I add a new value for a column for a subset of data?

I have some data as follows:

``````patient_id lab_value
1, 10
1, 3
2, 1
2, 4
3, 5
3, 10
3, 2
``````

What I'd like to do is to find the max lab_value per patient_id grouping then ultimately calculate the difference between a lab_value and the max as follows.

``````patient_id lab_value lab_diff
1, 10, 0
1, 3, 7 (10 - 3)
2, 1, 3
2, 4, 0
3, 5, 5 (10 - 5)
3, 10, 0
3, 2, 8 (10 - 2)
``````

How would I do this?

### Answers

inquisitive_mind February 2016

Steps

• Load data
• Group By Id
• Get max lab value for each id
• Distinct each group with max value
• Join data with max lab value based on id
• Generate diff value from max value - lab value

PIG script

``````A = LOAD 'test1.txt' USING PigStorage(',') AS (id:int, lab_value:int);
B = GROUP A BY id;
C = FOREACH B GENERATE group as id,MAX(A.lab_value) as max_value;
C1= DISTINCT C;
D = JOIN A BY id,C1 BY id;
E = FOREACH D GENERATE A::id,A::lab_value,(C1::max_value - A::lab_value) as diff_value;
DUMP E;
``````

Result

#### Post Status

Asked in February 2016
Viewed 2,936 times
Voted 13
Answered 1 times