I am loving Azure Data Lake but lack of documentation will probably slow down the adoption. I hope somebody out there have more experinnce on U-SQL than I do.
Trying to derive from what's available under Microsoft.Analytics.Interfaces and via U-SQL interpreter with not much luck. Dynamic sql does not seem to be supported to define the schema of a row set at run time and IUpdatableRow's schema is readonly so Processor approach is not viable. And there is no out of the box PIVOT capability in U-SQL.
I also thought that maybe I can process the rowset all together and write a custom outputter to pivot but couldn't figure it out.
There is probably a really easy way to do this as it is a standard pivot operation. How would you go about reshaping a rowset from I to II for an indeterminate number of ColA and ColB values in a performant way?
|ColA |ColB |ColC|
|1 |A |30 |
|1 |B |70 |
|1 |ZA |12 |
|2 |C |22 |
|2 |A |13 |
|ID |A |B |C |...... |ZA |.....
|1 |30 |70 |0 | |12 |
|2 |13 |0 |22 |...... |0 |.....
Note PIVOT / UNPIVOT syntax has been added to U-SQL as of March 2017.
Using the above sample data:
@t = SELECT *
FROM(
VALUES
( 1, "A", 30 ),
( 1, "B", 70 ),
( 1, "ZA", 12 ),
( 2, "C", 22 ),
( 2, "A", 13 ),
( 2, "ABC", 42)
) AS T(ColA, ColB, ColC);
@p =
SELECT Column_0 AS id, Column_1 AS a
FROM @t
PIVOT (MAX(ColC) FOR ColB IN ("A" AS [A], "B" AS [B], "C" AS [C], "ZA" AS [ZA], "ABC" AS [ABC])
) AS pvt;
OUTPUT @p
TO "/output/pivot3.csv"
USING Outputters.Csv();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With