Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

azure data lake u-sql pivot

I am loving Azure Data Lake but lack of documentation will probably slow down the adoption. I hope somebody out there have more experinnce on U-SQL than I do.

Trying to derive from what's available under Microsoft.Analytics.Interfaces and via U-SQL interpreter with not much luck. Dynamic sql does not seem to be supported to define the schema of a row set at run time and IUpdatableRow's schema is readonly so Processor approach is not viable. And there is no out of the box PIVOT capability in U-SQL.

I also thought that maybe I can process the rowset all together and write a custom outputter to pivot but couldn't figure it out.

There is probably a really easy way to do this as it is a standard pivot operation. How would you go about reshaping a rowset from I to II for an indeterminate number of ColA and ColB values in a performant way?

I

|ColA |ColB |ColC|
|1    |A    |30  |
|1    |B    |70  |
|1    |ZA   |12  |
|2    |C    |22  |
|2    |A    |13  |

II

|ID   |A    |B    |C   |...... |ZA   |.....
|1    |30   |70   |0   |       |12   |
|2    |13   |0    |22  |...... |0    |.....
like image 791
chi Avatar asked Oct 19 '25 21:10

chi


1 Answers

Note PIVOT / UNPIVOT syntax has been added to U-SQL as of March 2017.

Using the above sample data:

@t = SELECT *
     FROM(
        VALUES
        ( 1, "A", 30 ),
        ( 1, "B", 70 ),
        ( 1, "ZA", 12 ),
        ( 2, "C", 22 ),
        ( 2, "A", 13 ),
        ( 2, "ABC", 42)
     ) AS T(ColA, ColB, ColC);


@p =
    SELECT Column_0 AS id, Column_1 AS a
    FROM @t
      PIVOT (MAX(ColC) FOR ColB IN ("A" AS [A], "B" AS [B], "C" AS [C], "ZA" AS [ZA], "ABC" AS [ABC])
           ) AS pvt;


OUTPUT @p
TO "/output/pivot3.csv"
USING Outputters.Csv();
like image 200
wBob Avatar answered Oct 22 '25 05:10

wBob