Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging sequences of elements in XSLT

I've got wads of autogenerated HTML doing stupid things like this:

 <p>Hey it's <em>italic</em><em>italic</em>!</p>

And I'd like to mash that down to:

 <p>Hey it's <em>italicitalic</em>!</p>

My first attempt was along these lines...

<xsl:template match="em/preceding::em">
    <xsl:value-of select="$OPEN_EM"/>
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="em/following::em">
    <xsl:apply-templates/>
    <xsl:value-of select="$CLOSE_EM"/>
</xsl:template>

But apparently the XSLT spec in its grandmotherly kindness forbids the use of the standard XPath preceding or following axes in template matchers. (And that would need some tweaking to handle three ems in a row anyway.)

Any solutions better than forgetting about doing this in XSLT and just running a replace('</em><em>', '') in $LANGUAGE_OF_CHOICE on the end result? Rough requirements: should not combine two <em> if they are separated by anything (whitespace, text, tags), and while it doesn't have to merge them, it should at least produce valid XML if there are three or more <em> in a row. Handling tags nested within the ems (including other ems) is not required.

(And oh, I've seen how to merge element using xslt?, which is similar but not quite the same. XSLT 2 is regrettably not an option and the proposed solutions look hideously complex.)

like image 921
lambshaanxy Avatar asked Oct 19 '25 15:10

lambshaanxy


2 Answers

This is also like grouping adjacents:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()[1]|@*"/>
        </xsl:copy>
        <xsl:apply-templates select="following-sibling::node()[1]"/>
    </xsl:template>
    <xsl:template match="em">
        <em>
            <xsl:call-template name="merge"/>
        </em>
        <xsl:apply-templates
             select="following-sibling::node()[not(self::em)][1]"/>
    </xsl:template>
    <xsl:template match="node()" mode="merge"/>
    <xsl:template match="em" name="merge" mode="merge" >
        <xsl:apply-templates select="node()[1]"/>
        <xsl:apply-templates select="following-sibling::node()[1]" 
                             mode="merge"/>
    </xsl:template>
</xsl:stylesheet>

Output:

<p>Hey it's <em>italicitalic</em>!</p>

Note: Fine graneid traversal identity rule (copy everything, node by node); em rule (always the first, because the process is node by node), wraping and call merge template, apply template to next sibling not em; em rule in merge mode (also called merge), aplly templates to first child (this case it's just a text node, but this allows nested elements) and then to next sibling in merge mode; "break" rule, matching any thing not em (because name test beats node type test in priority) stops the process.

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kFollowing"
  match="em[preceding-sibling::node()[1][self::em]]"
  use="generate-id(preceding-sibling::node()[not(self::em)][1])"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
    "em[following-sibling::node()[1][self::em]
      and
        not(preceding-sibling::node()[1][self::em])
       ]">
   <em>
     <xsl:apply-templates select=
     "node()
     |
      key('kFollowing',
           generate-id(preceding-sibling::node()[1])
          )/node()"/>
   </em>
 </xsl:template>
 <xsl:template match=
 "em[preceding-sibling::node()[1][self::em]]"/>
</xsl:stylesheet>

when applied on the following XML document (based on the provided document, but with three adjacent em elements):

<p>Hey it's <em>italic1</em><em>italic2</em><em>italic3</em>!</p>

produces the wanted, correct result:

<p>Hey it's <em>italic1italic2italic3</em>!</p>

Do note:

  1. The use of the identity rule to copy every node as is.

  2. The use of a key in order to specify conveniently the following adjacent em elements.

  3. The overriding of the identity transform only for em elements that have adjacent em elements.

  4. This transformation merges any number of adjacent em elements.

like image 35
Dimitre Novatchev Avatar answered Oct 22 '25 06:10

Dimitre Novatchev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!