dStudios February 2016

PostgreSQL MAX() inner query really slow

I've been currently tasked with optimising some of the SQL queries that are ran on very large datasets. I have the query below which I'm trying to optimise but at the moment struggling with it

The query takes roughly around 40-45 minutes to comeback with an answer which is obviously not acceptable. It will even take longer as the dataset grows within DB.

I believe its the inner MAX() function that dramatically slows down the query?

I appreciate any help or advice in making the query better and faster.

SELECT 
  positionview.equipmenttimestamp, 
  positionview.name, 
  positionview.iNumber, 
  positionview.datauserprovidername, 
  positionview.latitude, 
  positionview.longitude
FROM 
  public.positionview
WHERE positionview.outbound = false
AND positionview.referenceIdentifier is null
AND positionview.responseType = 'XYZ'
AND positionview.equipmentTimestamp >='2016-02-02 14:44:08.213'
AND positionview.equipmentTimestamp <='2016-02-05 14:44:08.213'
AND positionview.equipmentTimestamp =(
  SELECT MAX(positionview2.equipmentTimestamp)
    FROM PositionView positionview2
    WHERE positionview.iNumber = positionview2.iNumber
    AND positionview2.outbound = false
    AND positionview2.referenceIdentifier is null
    AND positionview2.responseType = 'XYZ'
    AND positionview2.equipmentTimestamp >='2016-02-02 14:44:08.213'
    AND positionview2.equipmentTimestamp <='2016-02-05 14:44:08.213');

Many thanks in advance!

Answers


Gordon Linoff February 2016

I would suggest window functions:

SELECT pv.*
FROM (SELECT pv.*, MAX(pv.equipmenttimestamp) OVER (PARTITION BY pv.inumber) as maxet
      FROM public.positionview pv
      WHERE pv.outbound = false AND
            pv.referenceIdentifier is null AND
            pv.responseType = 'XYZ' AND
            pv.equipmentTimestamp >='2016-02-02 14:44:08.213' AND
            pv.equipmentTimestamp <='2016-02-05 14:44:08.213'
     ) pv
WHERE equipmenttimestamp = maxet;


a_horse_with_no_name February 2016

This kind of statement can be done using Postgres' distinct on() operator (which is usually faster than the equivalent solution using a standard window function):

SELECT distinct on (inumber)
  equipmenttimestamp, 
  name, 
  iNumber, 
  datauserprovidername, 
  latitude, 
  longitude
FROM 
  public.positionview
WHERE outbound = false
AND referenceIdentifier is null
AND responseType = 'XYZ'
AND equipmentTimestamp >='2016-02-02 14:44:08.213'
AND equipmentTimestamp <='2016-02-05 14:44:08.213'
order by inumber, equipmentTimestamp desc;

An index on (inumber,equipmenttimestamp) will probably help

Post Status

Asked in February 2016
Viewed 2,491 times
Voted 13
Answered 2 times

Search




Leave an answer