I'm learning how to use R to parse XML, and I'm trying to use Hadley's Wickham's xml2 package to parse a TEI XML text document, located here (warning: this is a zipped file, the particular one I'm trying to parse is given in the code below). I'm trying to get my head around how namespaces work in this package (I can't make sense of the documentation for the particular text I'm using). With the XML package, I could do the following: 
library("XML")
crisis <- xmlParse("data/Crisis130_22.2.tei.xml")
all_divs <- getNodeSet(crisis, "//def:div",
                   namespaces=c(def = "http://www.tei-c.org/ns/1.0"))
I can't figure out how to do this with xml2, however. I either get inherits(x, "xml_document") is not TRUE error or In node_find_all(x$node, x$doc, xpath = xpath, nsMap = ns) :
  Undefined namespace prefix [1219] error. This is what I tried: 
library("xml2")
crisis2 <- read_xml("data/Crisis130_22.2.tei.xml")
# check to see whether TEI URL is present
xml_ns(crisis2) 
all_divs2 <- xml_find_all(crisis2, "//div", xml_ns(crisis2)) # gives empty list
all_divs <- xml_find_all(crisis2, "/def:div", xml_ns(crisis2)) # undefined namespace error
I know that this is a new package, but does anyone know how to use namespaces in it?
Ok, I figured it out myself, but I thought I would post it here instead of deleting the question.
library("xml2")
crisis2 <- read_xml("data/Crisis130_22.2.tei.xml")
all_divs <- xml_find_all(crisis2, "//d1:div", xml_ns(crisis2))
In retrospect, I guess the answer is obvious, but, as I said, I thought I would post the solution here in case it helps anyone in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With