Lookup transformation against very large target table
I am creating an SSIS package that essentially attempts to find all rows in Table A that are not in Table B. The join column is an Identity column in Table A that is the clustered index and a column in Table B that is not an Identity, but is indexed. And I am doing this in batches of 10,000 rows at a time. Both table A and table B have approximately 350M rows.
I initially thought a Lookup transformation would be appropriate but I cannot use Full Cache because it attempts to load 350M rows in the cache! If I use No Cache, the process of looking up just 10,000 rows is horrendously slow (even though the lookup column in Table B is indexed).
Also, Table A and Table B are in two different databases on two different servers.
Is there another transformation that would be more appropriate for what I want to do?
You could try merge join component in the data flow task.
Drop two data source components onto the IDE;
Assume you are using sql command, ensure your query result is ordered by the join column;
Open the data source component in advanced mode, on the input and
output properties tab, set the output as sorted (isSorted = true),
and then set SortKeyPosition to 1 for the join column;
Then drop a Merge Join component and link two data source components to it.
Open the Merge Join component and change the join type to left join, and
tick the columns you want to have;
Finally drop a Conditional Split component to split output rows by the join column. Because we use Left Join in the Merge Join component, ISNULL(Joining Column) == True is what you are looking after
Actually when deal with large amount of rows, you could try some other ways to improve the performance, such as importing both tables into a staging database for sql join, as set operation is usually faster then row by row process.
Asked in February 2016Viewed 1,084 timesVoted 12Answered 1 times