Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Talend tXMLMap get all ocurrences of element without looping

Tags:

xml

csv

talend

I am learning Talend and I need to a XML to CSV mapping that I cannot make work right.

I have a document like this (really simplified):

<Report>
    <Payment>
        <Reference Type="1" Value="wathever11"/>
        <Reference Type="2" Value="wathever12"/>
        <Amount value="222"/>
    </Payment>
    <Payment>
        <Reference Type="1" Value="wathever22"/>
        <Reference Type="3" Value="wathever23"/>
        <Amount value="123242"/>
    </Payment>
    <Payment>
        <Reference Type="3" Value="wathever33"/>
        <Reference Type="2" Value="wathever32"/>
        <Amount value="12976"/>
    </Payment>
    <Payment>
        <Reference Type="1" Value="wathever41"/>
        <Reference Type="4" Value="wathever44"/>
        <Amount value="1456"/>
    </Payment>
    <Payment>
        <Amount value="12976"/>
    </Payment>
</Report>

So, I can have none to multiple References that can have up to 4 different Types.

I need to convert this to a CSV like this:

ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11    , whatever12    ,               ,               , 222
              , whatever22    , whatever23    ,               , 123242
              , whatever32    , whatever33    ,               , 12976
whatever41    ,               ,               , whatever44    , 1456
              ,               ,               ,               , 12976

To achieve this, I created this simple structure:

Talend xml Map

In the XMLFileInput this is the configuration: enter image description here

Where Payment is a Document

In the tXMLMap ive set a document structure like this:

Report
└──Payment (loop) First case, see below
      ├── Reference (loop) Second case, see below
      |      ├── @Type
      |      └── @Value
      └── Amount
             └── @Value

In the output Table i create a column for each reference with a expression like this:

[row1.Payment:/Payment/Reference/@Type].equals("1")? [row1.Payment:/Payment/Reference/@Value]: "" 

But, I am unable to map all the references in the same row. If a set the loop at Payment level, i get this:

ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11    ,               ,               ,               , 222
              ,               ,               ,               , 123242
              ,               ,               ,               , 12976
whatever41    ,               ,               ,               , 1456
              ,               ,               ,               , 12976

If the loop is set at reference level, I get a new line for every Reference, like this:

ReferenceType1, ReferenceType2, ReferenceType3, ReferenceType4, Amount
whatever11    ,               ,               ,               , 222
              , whatever12    ,               ,               , 222
              , whatever22    ,               ,               , 123242
              ,               , whatever23    ,               , 123242
              ,               , whatever33    ,               , 12976
              , whatever32    ,               ,               , 12976
whatever41    ,               ,               ,               , 1456
              ,               ,               , whatever44    , 1456

And no line for the element without Reference.

There must be a way to achieve the expected result. I tried to create an intermediate step, mapping each reference to a different Table, along with a sequence number, to further merge the columns to a single row.

Although I havent it finished, I think it could be a path to go. But the reference is no the only element what i need this for. I got a lot of elements where this happens in the real xml and the number of intermediate tables going like this would be big.

So I prefer to ask the experts opinion to try to find a more simple way to do this.

Any ideas?

like image 763
trompa Avatar asked Jan 23 '26 07:01

trompa


1 Answers

Good news: this is possible with tExtractXMLField. Check out my proposed solution:

job overview

tExtractXMLField

My output was based on your input xml, but it is different than your expected output, please double check carefully:

.-------------------+-------------------+-------------------+-------------------+------.
|                                      tLogRow_1                                       |
|=------------------+-------------------+-------------------+-------------------+-----=|
|referenceTypeValue1|referenceTypeValue2|referenceTypeValue3|referenceTypeValue4|amount|
|=------------------+-------------------+-------------------+-------------------+-----=|
|wathever11         |wathever12         |null               |null               |222   |
|wathever22         |null               |wathever23         |null               |123242|
|null               |wathever32         |wathever33         |null               |12976 |
|wathever41         |null               |null               |wathever44         |1456  |
|null               |null               |null               |null               |12976 |
'-------------------+-------------------+-------------------+-------------------+------'

Bad news: It is not possible with tXMLMap in Talend 5.3.1 for me. I tried to put this into the expressions field in tXMLMap: [row2.doc:/Report/Payment/Reference[@Type=1]/@Value] But then my job wouldn't generate anymore because of the additional brackets

like image 186
rretzbach Avatar answered Jan 25 '26 22:01

rretzbach