I've to parse an OAI-PMH XML file, which looks like the following. I would like to iterate over all <record> nodes in ListRecord.
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<responseDate>2010-12-30T10:46:39.654+08:00</responseDate>
<request verb="ListRecords" metadataPrefix="oai_dc">http://172.16.1.118/ahd/oai2.do</request>
<ListRecords>
<record>
<header>
<identifier>9010402101001001</identifier>
</header>
<metadata>
<oai_dc:dc xsi:schemaLocationfiltered="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:identifier>9010402101001001</dc:identifier>
</oai_dc:dc>
</metadata>
</record>
<resumptionToken>1509/1509</resumptionToken>
</ListRecords>
</OAI-PMH>
But when I using XOM 1.2.5 to get those node, no matter what method I use (query or getChildElements) it always return 0 nodes.
The following is the code I use in Scala interpreter:
scala> import nu.xom.Builder
import nu.xom.Builder
scala> val builder = new Builder
builder: nu.xom.Builder = nu.xom.Builder@6682d439
scala> val document = builder.build(new java.io.File("/home/brianhsu/qqq.xml"))
document: nu.xom.Document = [nu.xom.Document: OAI-PMH]
scala> document.query("//record").size
res0: Int = 0
scala> document.query("//ListRecords").size
res1: Int = 0
scala> document.getRootElement.getChildElements("ListRecords").size
res2: Int = 0
I've no idea why I could not get ListRecords and record in the XML. Did I miss something?
I found this is a duplicate of XPath Expression returns nothing for //element, but //* returns a count
The following code works, I need to bind the tag name to a namespace.
scala> val context = new XPathContext("xsi", "http://www.openarchives.org/OAI/2.0/")
context: nu.xom.XPathContext = nu.xom.XPathContext@19a3f495
scala> document.query("//xsi:record", context).size
res6: Int = 1
I'll wager that it is a xmlns issue -- have you tried using the domain parameter? Try:
document.getRootElement
.getChildElements("ListRecords",
"http://www.openarchives.org/OAI/2.0/").size
Basically, many languages, when given a default ns on an XML object, will require that namespace to look that node up -- even if it is not prefixed in the outputted DOM itself.
(This can also be done using the XPathContext object, as illustrated by Brian Hsu)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With