Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple XML parsing example for Nokogiri

Tags:

ruby

nokogiri

I am trying to get a list of keys and values for the Response object so I can turn them into a Hash, but I'm having problems understanding Nokogiri. The XML:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
<Response>
    <Name>Anonymous</Name>
    <ExternalDataReference></ExternalDataReference>
    <EmailAddress>hi guys</EmailAddress>
    <IPAddress>blahblah</IPAddress>
    <Status>0</Status>
..... (approximately 30 more elements within each response tag)
</Response>
(approximately 75 more response tags in the document)

My goal was to get something like this for each Response:

Name: Anonymous
ExternalDataReference:
EmailAddress: hi guys
IPAddress: blahblah

My code so far:

f=File.open("./stufftoparse.xml")
doc = Nokogiri::XML(f)
puts "#{doc.xpath("//Response").keys} \n#{doc.xpath("//Response").values}"

I know the code above doesnt work, but I dont quite get how to get the elements in the Response tag (I dont think they are attributes of the Response because they are within their own XML). Can someone explain how to do this? Please note, I have spent some time reading the Nokogiri docs and couldn't find much relating to XPATH examples.

Additional question: How can I split the Responses apart so that I have something like this?

Response1:
Name: Anonymous
ExternalDataReference:
EmailAddress: hi guys
IPAddress: blahblah

Response2:
Name: Anonymous
ExternalDataReference:
EmailAddress: hi guys
IPAddress: blahblah
like image 547
Rilcon42 Avatar asked Oct 18 '25 17:10

Rilcon42


1 Answers

The solution can be easier to see if you try it in steps.

Example XML:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
  <foo>
    <goo>a</goo>
    <hoo>b</hoo>
  </foo>
  <foo>
    <goo>c</goo>
    <hoo>d</hoo>
  </foo>
</xml>

The syntax //foo selects all the foo elements.

> puts doc.xpath("//foo")
<foo>
  <goo>a</goo>
  <hoo>b</hoo>
</foo>
<foo>
  <goo>c</goo>
  <hoo>d</hoo>
</foo>

Nokogiri returns nodes as a NodeSet like this:

> puts doc.xpath("//foo").class
Nokogiri::XML::NodeSet

A NodeSet is enumerable; you can use methods such as each, map, etc.

> puts doc.xpath("//foo").kind_of?(Enumerable)
true

This NodeSet contains two foo elements:

> doc.xpath("//foo").each{|e| puts e.class }
Nokogiri::XML::Element
Nokogiri::XML::Element

The syntax //foo/* selects the foo elements' child elements:

> puts doc.xpath("//foo/*")
<goo>a</goo>
<hoo>b</hoo>
<goo>c</goo>
<hoo>d</hoo>

To print an element's info, see Nokogiri/XML/Node documentation; the two methods you'll likely want are name and text.

Solution for you:

> doc.xpath("//foo/*").each{|e|
  puts "#{e.name}:#{e.text}" 
}
goo:a
hoo:b
goo:c
hoo:d

For your second question, you're essentially asking:

  1. for each foo element, get its child elements
  2. for each child element, print the name and text

Solution for you:

> doc.xpath("//foo").each_with_index{|parent_elem, parent_count| 
  puts "Parent #{parent_count + 1}"
  parent_elem.elements.each{|child_elem|
    puts "#{child_elem.name}:#{child_elem.text}"
  }
}
like image 87
joelparkerhenderson Avatar answered Oct 21 '25 20:10

joelparkerhenderson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!