I have a legacy product that I have to maintain. One of the table is somewhat similar to the following example:
DECLARE @t TABLE
(
id INT,
DATA NVARCHAR(30)
);
INSERT INTO @t
SELECT 1,
'name: Jim Ey'
UNION ALL
SELECT 2,
'age: 43'
UNION ALL
SELECT 3,
'----------------'
UNION ALL
SELECT 4,
'name: Johnson Dom'
UNION ALL
SELECT 5,
'age: 34'
UNION ALL
SELECT 6,
'----------------'
UNION ALL
SELECT 7,
'name: Jason Thwe'
UNION ALL
SELECT 8,
'age: 22'
SELECT *
FROM @t;
/*
You will get the following result
id DATA
----------- ------------------------------
1 name: Jim Ey
2 age: 43
3 ----------------
4 name: Johnson Dom
5 age: 34
6 ----------------
7 name: Jason Thwe
8 age: 22
*/
Now I want to get the information in the following form:
name age
-------------- --------
Jim Ey 43
Johnson Dom 34
Jason Thwe 22
What's the easiest way to do this? Thanks.
Out of (slightly morbid) curiosity I tried to come up with a means of transforming the exact input data you have provided.
Far better, of course, would be to properly structure the original data. With a legacy system, this may not be possible, but an ETL process could be created to bring this information into an intermediate location so that an ugly query like this would not need to be run in real time.
This example assumes that all IDs are consistent and sequential (otherwise, an additional ROW_NUMBER() column or a new identity column would need to be used to guarantee correct remainder operations on ID).
SELECT
Name = REPLACE( Name, 'name: ', '' ),
Age = REPLACE( Age, 'age: ', '' )
FROM
(
SELECT
Name = T2.Data,
Age = T1.Data,
RowNumber = ROW_NUMBER() OVER( ORDER BY T1.Id ASC )
FROM @t T1
INNER JOIN @t T2 ON T1.id = T2.id +1 -- offset by one to combine two rows
WHERE T1.id % 3 != 0 -- skip delimiter records
) Q1
-- skip every other record (minus delimiters, which have already been stripped)
WHERE RowNumber % 2 != 0
This is a more practical example because the actual ID values do not matter, only the row sequence.
DECLARE @NumberedData TABLE( RowNumber INT, Data VARCHAR( 100 ) );
INSERT @NumberedData( RowNumber, Data )
SELECT
RowNumber = ROW_NUMBER() OVER( ORDER BY id ASC ),
Data
FROM @t;
SELECT
Name = REPLACE( N2.Data, 'name: ', '' ),
Age = REPLACE( N1.Data, 'age: ', '' )
FROM @NumberedData N1
INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;
DELETE @NumberedData;
Again, it would be best to avoid running a query like this in real time and use a scheduled, transactional ETL process. In my experience, semi-structured data like this is prone to anomalies.
While examples #1 and #2 (and the solutions provided by others) demonstrate clever ways of working with the data, a more practical way to transform this data would be a cursor. Why? it may actually perform better (no nested queries, recursion, pivoting, or row numbering) and even if it is slower it provides much better opportunities for error handling.
-- this could be a table variable, temp table, or staging table
DECLARE @Results TABLE ( Name VARCHAR( 100 ), Age INT );
DECLARE @Index INT = 0, @Data VARCHAR( 100 ), @Name VARCHAR( 100 ), @Age INT;
DECLARE Person_Cursor CURSOR FOR SELECT Data FROM @t;
OPEN Person_Cursor;
FETCH NEXT FROM Person_Cursor INTO @Data;
WHILE( 1 = 1 )BEGIN -- busy loop so we can handle the iteration following completion
IF( @Index = 2 ) BEGIN
INSERT @Results( Name, Age ) VALUES( @Name, @Age );
SET @Index = 0;
END
ELSE BEGIN
-- optional: examine @Data for integrity
IF( @Index = 0 ) SET @Name = REPLACE( @Data, 'name: ', '' );
IF( @Index = 1 ) SET @Age = CAST( REPLACE( @Data, 'age: ', '' ) AS INT );
SET @Index = @Index + 1;
END
-- optional: examine @Index to see that there are no superfluous trailing
-- rows or rows omitted at the end.
IF( @@FETCH_STATUS != 0 ) BREAK;
FETCH NEXT FROM Person_Cursor INTO @Data;
END
CLOSE Person_Cursor;
DEALLOCATE Person_Cursor;
I created sample source data of 100K rows and the three aforementioned examples seem roughly equivalent for transforming data.
I created a million rows of source data and a query similar to the following gives excellent performance for selecting a subset of rows (such as would be used in a grid on a web page or a report).
-- INT IDENTITY( 1, 1 ) numbers the rows for us
DECLARE @NumberedData TABLE( RowNumber INT IDENTITY( 1, 1 ), Data VARCHAR( 100 ) );
-- subset selection; ordering/filtering can be done here but it will need to preserve
-- the original 3 rows-per-result structure and it will impact performance
INSERT @NumberedData( Data )
SELECT TOP 1000 Data FROM @t;
SELECT
N1.RowNumber,
Name = REPLACE( N2.Data, 'name: ', '' ),
Age = REPLACE( N1.Data, 'age: ', '' )
FROM @NumberedData N1
INNER JOIN @NumberedData N2 ON N1.RowNumber = N2.RowNumber + 1
WHERE ( N1.RowNumber % 3 ) = 2;
DELETE @NumberedData;
I'm seeing execution times of 4-10ms (i7-3960x) against a set of a million records.
Given that table you can do this:
;WITH DATA
AS
(
SELECT
SUBSTRING(t.DATA,CHARINDEX(':',t.DATA)+2,LEN(t.DATA)) AS value,
SUBSTRING(t.DATA,0,CHARINDEX(':',t.DATA)) AS ValueType,
ID,
ROW_NUMBER() OVER(ORDER BY ID) AS RowNbr
FROM
@t AS t
WHERE
NOT t.DATA='----------------'
)
, RecursiveCTE
AS
(
SELECT
Data.RowNbr,
Data.value,
Data.ValueType,
NEWID() AS ID
FROM
Data
WHERE
Data.RowNbr=1
UNION ALL
SELECT
Data.RowNbr,
Data.value,
Data.ValueType,
CASE
WHEN Data.ValueType='age'
THEN RecursiveCTE.ID
ELSE NEWID()
END AS ID
FROM
Data
JOIN RecursiveCTE
ON RecursiveCTE.RowNbr+1=Data.RowNbr
)
SELECT
pvt.name,
pvt.age
FROM
(
SELECT
ID,
value,
ValueType
FROM
RecursiveCTE
) AS SourceTable
PIVOT
(
MAX(Value)
FOR ValueType IN ([name],[age])
) AS pvt
Output
Name Age
------------------
Jim Ey 43
Jason Thwe 22
Johnson Dom 34
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With