When would I want to use the xmlParse function versus the xmlTreeParse function? Also, when are parameter values useInternalNodes=TRUE or asText=TRUE useful?
For example:
library("XML")
nct_url <- "http://clinicaltrials.gov/ct2/show/NCT00112281?resultsxml=true"
xml_doc <- xmlParse(nct_url, useInternalNodes=TRUE)
vs.
doc <- xmlTreeParse(getURL(nct_url), useInternalNodes=TRUE)
top <- xmlRoot(doc)
top[["keyword"]]
xmlValue(top[["start_date"]])
xmlValue(top[["location"]])
People seem to use the xmlTreeParse function for getting a non-repeating node via the $doc$children$... traversal. But I am not sure I understand when each approach is best. Parsing XML is one of the reasons to almost abandon R and learn Python. Lack of for-dummies examples without being forced to buy a book.
Here some feedback after using XML package.
xmlParse is a version of xmlTreeParse where argument useInternalNodes is set to TRUE.xmlTreeParse. This can be not very efficient and unnecessary if you want just to extract partial part of the xml document.xmlParse. But you should know some xpath bases to manipulate the result.asText=TRUE if you have a text not a file or an url as input.Here an example where I show the difference between the 2 functions:
txt <- "<doc>
<el> aa </el>
</doc>"
library(XML)
res <- xmlParse(txt,asText=TRUE)
res.tree <- xmlTreeParse(txt,asText=TRUE)
Now inspecting the 2 objects:
class(res)
[1] "XMLInternalDocument" "XMLAbstractDocument"
> class(res.tree)
[1] "XMLDocument" "XMLAbstractDocument"
You see that res is an internal document. It is pointer to a C object. res.tree is an R object. You can get its attributes like this :
res.tree$doc$children
$doc
<doc>
<el>aa</el>
</doc>
For res, you should use a valid xpath request and one of theses functions ( xpathApply, xpathSApply ,getNodeSet) to inspect it. for example:
xpathApply(res,'//el')
Once you create a valid Xml Node , you can apply xmlValue, xmlGetAttr,..to extract node information. So here this 2 statements are equivalent:
## we have already an R object, just apply xmlValue to the right child
xmlValue(res.tree$doc$children$doc)
## xpathSApply create an R object and pass it to
xpathSApply(res,'//el',xmlValue)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With