Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get specific data from table with xPath

I've this table with source code HERE:

enter image description here

I want to get all rows, wich I can do using:

enter image description here

The expected final output using string-join($doc//*[@id='salaries']/tbody/tr/normalize-space(.), '
') is:

1985-86 Los Angeles Lakers NBA $2,030,000
1987-88 Los Angeles Lakers NBA $2,000,000
1988-89 Los Angeles Lakers NBA $3,000,000

My question is, how to remove the third column (named NBA in this example) from the final output to get this:

1985-86 Los Angeles Lakers $2,030,000
1987-88 Los Angeles Lakers $2,000,000
1988-89 Los Angeles Lakers $3,000,000

ps: I'm not sure that column is always in that place, but the anchor contains 'league' in it a[contains(@href, 'league')]

like image 865
Enissay Avatar asked Jan 28 '26 18:01

Enissay


1 Answers

This XPath 2.0 expression:

  for $i in 1 to count(/tbody/tr),
      $r in /tbody/tr[$i],
      $s in string-join($r/td[not(position() eq 3)]/normalize-space(.), ' ')
   return
     concat($s, '
')

when evaluated on the provided XML document:

<tbody>
<tr class="" data-row="0">
   <td align="left">1985-86</td>
   <td align="left"><a href="/teams/LAL/1986.html">Los Angeles Lakers</a></td>
   <td align="left"><a href="/leagues/NBA_1986.html">NBA</a></td>
   <td align="right" csk="2030000">$2,030,000</td>
</tr>
<tr class="" data-row="1">
   <td align="left">1987-88</td>
   <td align="left"><a href="/teams/LAL/1988.html">Los Angeles Lakers</a></td>
   <td align="left"><a href="/leagues/NBA_1988.html">NBA</a></td>
   <td align="right" csk="2000000">$2,000,000</td>
</tr>
<tr class="" data-row="2">
   <td align="left">1988-89</td>
   <td align="left"><a href="/teams/LAL/1989.html">Los Angeles Lakers</a></td>
   <td align="left"><a href="/leagues/NBA_1989.html">NBA</a></td>
   <td align="right" csk="3000000">$3,000,000</td>
</tr>
</tbody>

produces the wanted, correct result:

 1985-86 Los Angeles Lakers $2,030,000
 1987-88 Los Angeles Lakers $2,000,000
 1988-89 Los Angeles Lakers $3,000,000

If the position of the column to be excluded isn't guaranteed to be fixed, use:

  for $i in 1 to count(/tbody/tr),
      $r in /tbody/tr[$i],
      $s in string-join($r/td[not(starts-with(a/@href,'/leagues'))]
                              /normalize-space(.), ' ')
   return
     concat($s, '&#xA;')
like image 138
Dimitre Novatchev Avatar answered Jan 30 '26 14:01

Dimitre Novatchev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!