Inside a data flow, is it possible to do a lookup, insert the non-matching rows, re-lookup and then continue with the full (original) data set?
I can't see how you get/re-use the inserted rows after they've gone to an OLE DB Destination, nor can I see that a Multicast could wait until a lookup insert branch completes (and then union all them back-in).
If it can't be done inside the data flow, would I have to repeat all my filtering logic in a previous data flow task simply to do the lookup inserts first?
This might be an obvious/already answered question, but I can't find much after searching.
This isnt possible inside a single Data Flow. There are various "solutions" around if you google enough but they overlook the architectural reality that rows travel down a Data Flow in buffers/batches, processed in parallel.
So image you have multiple "new" rows arriving in 2 adjacent buffers. There is no way to ensure that your downstream handling of "new" rows from buffer 1 has been completed before buffer 2 hits your upstream lookup. This will result in multiple "new" rows being inserted to your lookup target table for the same key.
You need to have an upstream Data Flow Task that does all the required lookup inserts. This will be a more efficient solution overall at runtime, as your lookup inserts can use Fast Load and Table Lock, and your downstream Lookup can be Full Cache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With