Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XSL transform on text to XML with unparsed-text: need more depth

Tags:

xslt

xslt-2.0

My rather well-formed input (I don't want to copy all data):

StartThing
Size Big
Colour Blue
coords 42, 42
foo bar
EndThing
StartThing
Size Small
Colour Red
coords 29, 51
machin bidule
EndThing
<!-- repeat a few thousand times-->

I have the below XSL which I modified from Parse text file with XSLT

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="text-encoding" as="xs:string" select="'iso-8859-1'"/>
    <xsl:param name="text-uri" as="xs:string" select="'unparsed-text.txt'"/>

    <xsl:template name="text2xml">
        <xsl:variable name="text" select="unparsed-text($text-uri, $text-encoding)"/>
        <xsl:analyze-string select="$text" regex="(Size|Colour|coords) (.+)">    
            <xsl:matching-substring>
                <xsl:element name="{(regex-group(1))}">
                    <xsl:value-of select="(regex-group(2))"/>
                </xsl:element>          
            </xsl:matching-substring>
        </xsl:analyze-string>
    </xsl:template>

    <xsl:template match="/">
        <xsl:call-template name="text2xml"/>    
    </xsl:template>
</xsl:stylesheet>

and it works fine on parsing the pairs into elements and values. It gives me this output:

<?xml version="1.0" encoding="UTF-8"?>
<Size>Big</Size>
<Colour>Blue</Colour>
<coords>42, 42</coords>

But I'd also like to wrap the values in the Thing tag so that my output looks like this:

<Thing>
    <Size>Big</Size>
    <Colour>Blue</Colour>
    <coords>42, 42</coords>
</Thing>

One solution might be a regex that matches each group of lines after each "thing". Then matches substrings as I'm already doing. Or is there some other way to parse the tree?

like image 236
John Cornellier Avatar asked Oct 29 '25 22:10

John Cornellier


1 Answers

I would use two nested analyze-string levels, an outer one to extract everything between StartThing and EndThing, and then an inner one that operates on the strings matched by the outer one.

<xsl:template name="text2xml">
    <xsl:variable name="text" select="unparsed-text($text-uri, $text-encoding)"/>
    <!-- flags="s" allows .*? to match across newlines -->
    <xsl:analyze-string select="$text" regex="StartThing.*?EndThing" flags="s">
        <xsl:matching-substring>
            <Thing>
                <!-- "." here is the matching substring from the outer regex -->
                <xsl:analyze-string select="." regex="(Size|Colour|coords) (.+)">
                    <xsl:matching-substring>
                        <xsl:element name="{(regex-group(1))}">
                            <xsl:value-of select="(regex-group(2))"/>
                        </xsl:element>          
                    </xsl:matching-substring>
                </xsl:analyze-string>
            </Thing>
        </xsl:matching-substring>
    </xsl:analyze-string>
</xsl:template>
like image 95
Ian Roberts Avatar answered Nov 02 '25 15:11

Ian Roberts



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!