Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xmllint problems to output lines

I know that my question includes 2 questions...

At first, I want to use xmllint to output "loc" content tags. The sitemap I load has got a xmlns="...".

On xmllint shell, I need to do this:

setrootns
xpath //defaultns:loc

That works... no problem. But I need to do this in a bash script.

(AFAIK) xmllint hasn't got option to tell "let's go, setrootns" so I cannot do this:

xmllint --xpath "//loc" sitemaps.xml
# or
xmllint --xpath "//defaultns:loc" sitemaps.xml

This is the first question, how can I tell to xmllint to load the default ns ?

If I can't, let's take a look on my second solution:

I can remove xmlns attribute and then, there os no ns to use:

xmllint --xpath "//loc" <(sed -r 's/xmlns=".*?"//' sitemaps.xml)

But... now... the whole response of my 500 "loc" content is concatenated in one line !...

I tried this too:

xmllint --shell sitemaps.xml <<EOF
setrootns
xpath //defaultns:loc/text()
EOF

Or again

xmllint --shell sitemaps.xml <<EOF
setrootns
cat //defaultns:loc
EOF

The first gives me (for example)

465  TEXT
    content=http://... 

with truncated url

The second gives me "------" every 2 lines... and a "/>" at last line...

And I begin to be very nervous... :)

A big thanks if you find any solution.

The goal is to have every location, one per line.

like image 327
Metal3d Avatar asked Dec 06 '25 17:12

Metal3d


2 Answers

@BrnVrn is right, I only had to append "\n" after tags

Then I found my answer about namespaces, I can use local-name to not check default namespace

So, I did this:

xmllint  --xpath "//*[local-name()='loc']/text()" <(sed 's/<loc>/<loc>\n/g' sitemaps.xml)

And it works !

Thanks to all

like image 68
Metal3d Avatar answered Dec 08 '25 10:12

Metal3d


I used to do something similar:

clean_xml_message=$(echo "$xml_message" | sed 's/xmlns/ignore/')

Eventually you could try to put back the new lines:

sed 's/></>\n</g' 

I guess you only want the URL without the <loc></loc> ? Then I would select all the loc elements with xmllint:

<loc>...</loc><loc>...</loc><loc>...</loc>

Then add the new lines: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'

<loc>
...
</loc><loc>
...
</loc><loc>
...
</loc>

Finally remove the tags grep -v "<loc>" |grep -v "</loc>" or a single grep -v "$<" could do it. (-v is the invert selection: http://unixhelp.ed.ac.uk/CGI/man-cgi?grep)

like image 31
BrnVrn Avatar answered Dec 08 '25 10:12

BrnVrn



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!