user3637224 February 2016

Is there any way to simplify union query while using multiple tables, with reference to main table ( Performance Optimization )

I wonder is there any way to improve query, speed up / simplify while using multiple tables UNION clause with reference to only one main table.

following is the scenario, here table reference is my main table and geo_loc1,geo_loc2,geo_loc2 are other tables, which will be joined based on key1 and key2, these two keys exists in all table.

SELECT 
    reference.*,
    geo_loc1.bathy,
    geo_loc1.gravity,
    geo_loc1.magnet,
    'data1' as type
FROM 
    reference,geo_loc1 
WHERE
    reference.latitude between -30 and -10 AND 
    reference.longitude between 10 and 50 AND 
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc1.key1 AND
    reference.key2 = geo_loc1.key2
UNION
SELECT 
    reference.*,
    geo_loc2.bathy,
    Null as gravity,
    geo_loc2.magnet,
    'data2' as type
FROM 
    reference,geo_loc2 
WHERE
    reference.latitude between -30 and -10 AND 
    reference.longitude between 10 and 50 AND 
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc2.key1 AND
    reference.key2 = geo_loc2.key2
UNION
SELECT 
    reference.*,
    Null as bathy,
    Null as gravity,
    geo_loc3.magnet,
    'data3' as type
FROM 
    reference,geo_loc3 
WHERE
    reference.latitude between -30 and -10 AND 
    reference.longitude between 10 and 50 AND 
    DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10'  AND
    reference.key1 = geo_loc3.key1 AND
    reference.key2 = geo_loc3.key2

Whether this query can be simplified or improved ?

Answers


Tab Alleman February 2016

You can safely use UNION ALL instead of UNION, since your queries cannot contain any duplicate rows.

But you will mainly get performance from good indexes, rather than from different ways of writing the query.


Ben Thurley February 2016

For performance I can see two things you might change.

1) Change UNION to UNION ALL. UNION will check the results for duplicates which adds overhead you don't need.

2) Consider changing DATE( reference.st_date ) BETWEEN '2000-07-05' AND '2011-11-10' to not use the date function.

If reference.st_date were a proper date column and you didn't need to do the type conversion then the DB could use an index based on date to do a quick lookup. By doing the type conversion the DB will now have to do a scan to do the conversion for each row.

You say the index is on key1,key2,st_date,latitude,longitude. By using the date function you're probably only using the first two columns of that index. I would either change st_date to a date or change the values you're comparing it to so the conversion isn't required.

Also consider using a proper INNER JOIN rather than adding join conditions as WHERE clauses. It's much more readable using the proper syntax and less error prone.

Example using UNION ALL and INNER JOIN should look something like this. Note this is untested obviously but you should get the idea.
Edit Removed use of date() function which should speed it up further

SELECT 
    reference.*,
    geo_loc1.bathy,
    geo_loc1.gravity,
    geo_loc1.magnet,
    'data1' as type
FROM 
    reference
INNER JOIN geo_loc1 ON
    reference.key1 = geo_loc1.key1 AND
    reference.key2 = geo_loc1.key2
WHERE
    reference.latitude between -30 and -10 AND 
    reference.longitude between 10 and 50 AND 
    reference.st_date BETWEEN '2000-07-05 00:00:00' AND '2011-11-10 23:59:59'
UNION ALL
SELECT 
    reference.*,
    geo_loc2.bathy,
    Null as gravity,
    geo_loc2.magnet,
    'data2' as type
FROM 
    reference
INNER JOIN geo_loc2 ON
    reference.key1 = geo_loc2.key1 AND
    reference 

Post Status

Asked in February 2016
Viewed 3,220 times
Voted 12
Answered 2 times

Search




Leave an answer