simplycoding February 2016

SQL first_value and last_value

This is Netezza, which runs a flavor of Postgres

I'm using the first_value and last_value window functions, which are giving different values and confusing me.

Here's the code I'm running:

select
    a.emp_id
    , a.sequence
    , FIRST_VALUE(a.DEP_ID) OVER(PARTITION BY a.emp_id ORDER BY a.sequence) AS first_dept
    , LAST_VALUE(a.DEP_ID) OVER(PARTITION BY a.emp_id ORDER BY a.sequence) AS last_dept
    , FIRST_VALUE(a.sequence) OVER(PARTITION BY a.emp_id ORDER BY a.sequence) AS first_seq
    , LAST_VALUE(a.sequence) OVER(PARTITION BY a.emp_id ORDER BY a.sequence) AS last_seq
FROM
    tname a
WHERE
    1=1
    AND a.event_id IN (1, 2)
    AND a.emp_id=11111

For emp_id = 1111 this returns 2 records.

emp_id   SEQuence   FIRST_DEPT  LAST_DEPT   FIRST_SEQ   LAST_SEQ
90565520    1       200800      200800      1           1
90565520    10      200800      200932      1           10

I'm confused as to why last_seq is returning 1 instead of 10

Answers


Abelisto February 2016

In the window functions order by used for "running" calculations, so for first row FIRST_VALUE and LAST_VALUE returns values for first row only and for second row it returns values for two rows and so on.

You can ensure in it by adding something like sum(a.sequence) over (PARTITION BY a.emp_id ORDER BY a.sequence). In your example it will be 1 and 11 (= 1+10).

So you should to remove order by from over and add order by a.DEP_ID, a.sequence at the end of the whole query.


klin February 2016

Example data:

create table sample (id int, val int);
insert into sample select i, i from generate_series(1,5) i;

When you use order by in a window function it calculates cumulative values:

select 
    id, val,
    first_value(val) over (order by val),
    last_value(val) over (order by val),
    sum(val) over (order by val)
from sample;

 id | val | first_value | last_value | sum 
----+-----+-------------+------------+-----
  1 |   1 |           1 |          1 |   1
  2 |   2 |           1 |          2 |   3
  3 |   3 |           1 |          3 |   6
  4 |   4 |           1 |          4 |  10
  5 |   5 |           1 |          5 |  15
(5 rows)

Just do not use order by:

select 
    id, val,
    first_value(val) over (),
    last_value(val) over (),
    sum(val) over ()
from sample;

 id | val | first_value | last_value | sum 
----+-----+-------------+------------+-----
  1 |   1 |           1 |          5 |  15
  2 |   2 |           1 |          5 |  15
  3 |   3 |           1 |          5 |  15
  4 |   4 |           1 |          5 |  15
  5 |   5 |           1 |          5 |  15
(5 rows)    

To get exactly one row you can use DISTINCT ON. Add ORDER BY sequence desc so the last seuqence will be shown in the row:

SELECT DISTINCT ON (a.emp_id)
    a.emp_id
    , a.sequence
    , FIRST_VALUE(a.DEP_ID) OVER(PARTITION BY a.emp_id) AS first_dept
    , LAST_VALUE(a.DEP_ID) OVER(PARTITION BY a.emp_id) AS last_dept
    , FIRST_VALUE(a.sequence) OVER(PARTITION BY a.emp_id) AS first_seq
    , LAST_VALUE(a.sequence) OVER(PARTITION BY a.emp_id) AS last_seq
FROM
    tname a
WHERE
    1=1
    AND a.event_id IN (1, 2)
    AND a.emp_id=11111
ORDER BY
    a.sequence desc;

Post Status

Asked in February 2016
Viewed 3,108 times
Voted 6
Answered 2 times

Search




Leave an answer