I'm trying to batch some insertion scripts (avoiding duplicates) and I've come across some tables that have no primary key (I know...I didn't create them and I cannot modify them). Basically what I've done is grabbed the rows I need, put them into a temporary table ([TempTable]), and updated some values in them.
Now I need to re-insert DISTINCT TOP values from [TempTable] into [OriginalTable] in batches. To do this, I imagine I would need a column in the temp table (which I've created...let's call it [ValuesInserted]), that specifies which columns were just inserted.
I would do an INSERT statement to put DISTINCT values into the original table, using TOP to batch it.
INSERT INTO [OriginalTable]
SELECT DISTINCT TOP (1000) *
FROM [TempTable]
Then I would UPDATE the temp table to have ValuesInserted set to 1 for the records that were just inserted. This is where I'm stuck:
UPDATE /*TOP (1000) - Doesn't work*/ [TempTable]
SET [ValuesInserted] = 1
???
Then I would DELETE those records from the temp table so that my next INSERT statement (using TOP) will not capture the previous set of records.
DELETE
FROM [TempTable]
WHERE [ValuesInserted] = 1
The main problem I'm having is that just running an UPDATE on just the TOP (1000) rows, doesn't capture all of the records that may have duplicates in [TempTable]. I also cannot perform an INNER JOIN on all columns on two copies of [TempTable] because this is being run on many different tables using dynamic SQL. Basically, the script needs to be generic (not specific to any table), but it should be assumed that there is no primary key.
The following generic sample captures the idea:
Val1 Val2 ValuesInserted
1 1 0
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
1 7 0
1 8 0
1 9 0
1 1 0 <--Duplicate
2 1 0
2 2 0
2 3 0
2 4 0
2 5 0
2 6 0
2 7 0
2 8 0
2 9 0
2 1 0 <--Duplicate
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
3 6 0
3 7 0
3 8 0
3 9 0
3 1 0 <--Duplicate
1 2 0 <--Duplicate
1 3 0 <--Duplicate
Doing an UPDATE TOP (5) on this above data set will only update the first 5 records:
Val1 Val2 ValuesInserted
1 1 1 <--Updated
1 2 1 <--Updated
1 3 1 <--Updated
1 4 1 <--Updated
1 5 1 <--Updated
1 6 0
1 7 0
1 8 0
1 9 0
1 1 0 <--Duplicate
2 1 0
2 2 0
2 3 0
2 4 0
2 5 0
2 6 0
2 7 0
2 8 0
2 9 0
2 1 0 <--Duplicate
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
3 6 0
3 7 0
3 8 0
3 9 0
3 1 0 <--Duplicate
1 2 0 <--Duplicate
1 3 0 <--Duplicate
I need to update any records that match the top 5 records like so:
Val1 Val2 ValuesInserted
1 1 1 <--Updated
1 2 1 <--Updated
1 3 1 <--Updated
1 4 1 <--Updated
1 5 1 <--Updated
1 6 0
1 7 0
1 8 0
1 9 0
1 1 1 <--Updated
2 1 0
2 2 0
2 3 0
2 4 0
2 5 0
2 6 0
2 7 0
2 8 0
2 9 0
2 1 0 <--Duplicate
3 1 0
3 2 0
3 3 0
3 4 0
3 5 0
3 6 0
3 7 0
3 8 0
3 9 0
3 1 0 <--Duplicate
1 2 1 <--Updated
1 3 1 <--Updated
If you can make your idea work on this sample, I can apply it to my specific case.
Am I approaching this completely wrong, or am I missing something? I'm looking for a solution that doesn't hog resources because the script is batched and is running on very large databases on high-impact servers.
The closest topic I could find on this was:
Using Distinct in SQL Update.
However, the answers given would not work when using TOP.
EDIT: This apparently wasn't clear at the beginning. The first thing I'm doing is grabbing rows from [OriginalTable] and putting them into [TempTable]. These rows are initially unique. However, I perform an update that modifies some of the values, yielding data like the sample above. From there, I need to grab DISTINCT rows and re-insert them into [OriginalTable].
It looks like you're really going out of your way to make this as complicated as possible. I would just remove the duplicates from the temporary table in the first place. Or never INSERT them there, which is even better. Or build an actual ETL solution, perhaps with SSIS.
Those things said, what you're looking for is the OUTPUT clause, which can be added to any INSERT, UPDATE, or DELETE statement:
DECLARE @inserted_ids TABLE (val1, val2)
INSERT INTO dbo.OriginalTable (val1, val2)
OUTPUT INSERTED.val1, INSERTED.val2 INTO @inserted_ids
SELECT DISTINCT TOP 1000 val1, val2
FROM dbo.TempTable
DELETE TT
FROM @inserte_ids II
INNER JOIN dbo.TempTable TT ON
TT.val1 = II.val1 AND
TT.val2 = II.val2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With