If I do:
INSERT INTO dst
SELECT blah
FROM src
CROSS APPLY xmlcolumn.nodes('blah')
where dst has an identity column, can one say for certain that the identity column order matches the order of the nodes from the original XML document?
I think the answer is no, there are no guarantees and that to ensure the ordering is able to be retained, some ordering information needs to also be extracted from the XML at the same time the nodes are enumerated.
There's no way to see it explicitly in an execution plan, but the id column returned by the nodes() method is a varbinary(900) OrdPath, which does encapsulate the original xml document order.
The solution offered by Mikael Eriksson on the related question Does the `nodes()` method keep the document order? relies on the OrdPath to provide an ORDER BY clause necessary to determine how identity values are assigned for the INSERT.
A slightly more compact usage follows:
CREATE TABLE #T
(
ID integer IDENTITY,
Fruit nvarchar(10) NOT NULL
);
DECLARE @xml xml =
N'
<Fruits>
<Apple />
<Banana />
<Orange />
<Pear />
</Fruits>
';
INSERT #T
(Fruit)
SELECT
N.n.value('local-name(.)', 'nvarchar(10)')
FROM @xml.nodes('/Fruits/*') AS N (n)
ORDER BY
ROW_NUMBER() OVER (ORDER BY N.n);
SELECT
T.ID,
T.Fruit
FROM #T AS T
ORDER BY
T.ID;
db<>fiddle
Using the OrdPath this way is presently undocumented, but the technique is sound in principle:
ROW_NUMBER computes sequence values ordered by OrdPath*.ORDER BY clause uses the row number sequence.ORDER BY.To be clear, this holds even if parallelism is employed. As Mikael says, the dubious aspect is using id in the ROW_NUMBER since id is not documented to be the OrdPath.
* The ordering is not shown in plans, but optimizer output using TF 8607 contains:
ScaOp_SeqFunc row_number order[CALC:QCOL: XML Reader with XPath filter.id ASC]
Under the current implementation of .nodes, the XML nodes are generated in document order. The result of that is always joined to the original data using a nested loops, which always runs in order also.
Furthermore, inserts are generally serial (except under very specific circumstances that it goes parallel, usually when you have an empty table, and never with an IDENTITY value being generated).
Therefore there is no reason why the server would ever return rows in a different order than the document order. You can see from this fiddle that that is what happens.
That being said, there is no guarantee that the implementation of .nodes won't change, or that inserts may in future go parallel, as neither of these is documented anywhere as being guaranteed. So I wouldn't rely on it without an explicit ORDER BY, and you do not have a column to order it on.
Using an
ORDER BYwould guarantee it. The docs state: "INSERTqueries that useSELECTwithORDER BYto populate rows guarantees how identity values are computed but not the order in which the rows are inserted."
Even using ROW_NUMBER with ORDER BY (SELECT NULL) as some have recommended is also not guaranteed. You need to get the document order directly from XQuery.
The problem is that SQL Server's version of XQuery does not allow using position(.) as a result, only as a predicate. Instead, you can use a hack involving the << positional operator.
For example:
SELECT T.X.value('text()[1]', 'nvarchar(100)') as RowLabel,
T.X.value('let $i := . return count(../*[. << $i]) + 1', 'int') as RowNumber
FROM src
CROSS APPLY xmlcolumn.nodes('blah') as T(X);
What this does is:
. to the variable $i../* i.e. all children of the parent of this node[. << $i] that are previous to $iIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With