Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select all links on a page using XPath

Tags:

xpath

xpointer

I want to write a function that identifies all the links on a particular HTML page. My idea was to use XPath, by using a path such as //body//a[x] and incrementing x to go through the first, second, third link on the page.

Whilst trying this out in Chrome, I load up the page http://exoplanet.eu/ and in the Chrome Developer Tools JS console, I call $x("//body//a[1]"). I expect the very first link on the page, but this returns a list of multiple anchor elements. Calling $x("//body//a[2]") returns two anchor elements. Calling $x("//body//a[3]") returns nothing.

I was hoping that incrementing the [x] each time would give me each unique link one by one on the page, but they seem to be grouped. How can I rewrite this path so that I picks each anchor tag, one by one?

like image 620
njp Avatar asked Oct 19 '25 13:10

njp


2 Answers

Your //body//a[1] should be (//body//a)[1] if you want to select the first link on the page. The former expression selects any element that is the first child of its parent element.

But it seems a very odd thing to do anyway. Why do you need the links one by one? Just select all of them, as a node-list or node-set, using //body//a, and then iterate over the set.

like image 74
Michael Kay Avatar answered Oct 22 '25 03:10

Michael Kay


If you use the path //body/descendant::a[1], //body/descendant::a[2] and so on you can select all descendant a elements of the body element. Or with your attempt you need braces e.g. (//body//a)[1], (//body//a)[2] and so on.

Note however that inside the browser with Javascript there is a document.links collection in the object model so no XPath needed to access the links.

like image 41
Martin Honnen Avatar answered Oct 22 '25 05:10

Martin Honnen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!