Here is a problem. We have two tables A and B. Both have a field of Date type. Denote them Date_A and Date_B, correspondingly. The problem is to join these two tables to each other by the following method - each row X from the table A we need to join to some row Y from B such that among all rows Z from B row Y provides the minimum value of expression abs(Date_A(X)-Date_B(Z)), i.e. it's value in Date_B is the closest to value of Date_A in X. Assumingly, this minimum is always unique, but if it's not then it's good to choose one of them randomly (but only one).
Of course you could write a join-condition based on your logic, but this worst_case in a parallel DBMS like Teradata. It will always result in a product join (probably followed by another step to return only one matching row).
For a closest-match-join I usually try to find the actual value using LAG/LEAD logic:
dt as date_A,
-- find the previous date_B // LAG
MAX(CASE WHEN x = 2 THEN dt end) OVER (ORDER BY dt, x ROWS UNBOUNDED PRECEDING ) AS prev_date,
-- find the next date_B // LEAD
MIN(CASE WHEN x = 2 THEN dt end) OVER (ORDER BY dt, x ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS next_date,
-- now find the nearest date
CASE WHEN prev_date IS NULL THEN next_date
WHEN next_date IS NULL THEN prev_date
WHEN dt - prev_date < next_date - dt THEN prev_date
END AS date_B
( -- get all dates, maybe UNION INSTEAD OF UNION ALL
SELECT date_A as dt, 1 AS x
SELECT date_B, 2
) AS dt
QUALIFY x = 1 -- only rows from table_A
This will need two STAT-steps, but now you got the correct date_B to equi-join. Put this in a Derived Table and join back to both table_A and tabke_B, should be significantly faster unless both tables are very small.
Of course you're probably joining on other columns as well, so add them to UNION and PARTITON BY.
Asked in February 2016Viewed 1,771 timesVoted 12Answered 1 times