Nikolay Lysenko February 2016

SQL, joining tables

Here is a problem. We have two tables A and B. Both have a field of Date type. Denote them Date_A and Date_B, correspondingly. The problem is to join these two tables to each other by the following method - each row X from the table A we need to join to some row Y from B such that among all rows Z from B row Y provides the minimum value of expression abs(Date_A(X)-Date_B(Z)), i.e. it's value in Date_B is the closest to value of Date_A in X. Assumingly, this minimum is always unique, but if it's not then it's good to choose one of them randomly (but only one). Example. Table A:

"A", "B", "2015-10-01"

Table B: "C", "2015-10-07" "D", "2015-12-02"

Expected result:

"A", "B", "2015-10-01", "C", "2015-10-07"

p.s. the platform is Teradata, if it matters

Answers


dnoeth February 2016

Of course you could write a join-condition based on your logic, but this worst_case in a parallel DBMS like Teradata. It will always result in a product join (probably followed by another step to return only one matching row).

For a closest-match-join I usually try to find the actual value using LAG/LEAD logic:

SELECT
  dt as date_A,
  -- find the previous date_B // LAG
  MAX(CASE WHEN x = 2 THEN dt end) OVER (ORDER  BY dt, x ROWS UNBOUNDED PRECEDING ) AS prev_date,
  -- find the next date_B // LEAD
  MIN(CASE WHEN x = 2 THEN dt end) OVER (ORDER  BY dt, x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS next_date,
  -- now find the nearest date
  CASE WHEN prev_date IS NULL THEN next_date
       WHEN next_date IS NULL THEN prev_date
       WHEN dt - prev_date < next_date - dt THEN prev_date 
       ELSE next_date 
  END AS date_B
FROM
 ( -- get all dates, maybe UNION INSTEAD OF UNION ALL
   SELECT date_A as dt, 1 AS x
   FROM table_A
   UNION ALL
   SELECT date_B, 2
   FROM table_B
 ) AS dt
QUALIFY x = 1 -- only rows from table_A

This will need two STAT-steps, but now you got the correct date_B to equi-join. Put this in a Derived Table and join back to both table_A and tabke_B, should be significantly faster unless both tables are very small.

Of course you're probably joining on other columns as well, so add them to UNION and PARTITON BY.

Post Status

Asked in February 2016
Viewed 1,771 times
Voted 12
Answered 1 times

Search




Leave an answer