Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maintaining the indentation of an XML file when parsed with Beautifulsoup

I am using BS4 to parse an XML file and trying to write it back to a new XML file.

Input file:

<tag1>
  <tag2 attr1="a1"> example text </tag2>
  <tag3>
    <tag4 attr2="a2"> example text </tag4>
    <tag5>
      <tag6 attr3="a3"> example text </tag6>
    </tag5>
  </tag3>
</tag1>

Script:

soup = BeautifulSoup(open("input.xml"), "xml")
f = open("output.xml", "w") 
f.write(soup.encode(formatter='minimal'))
f.close()

Output:

<tag1>
<tag2 attr1="a1"> example text </tag2>
<tag3>
<tag4 attr2="a2"> example text </tag4>
<tag5>
<tag6 attr3="a3"> example text </tag6>
</tag5>
</tag3>
</tag1>

I want to retain the indentation of the input file. I tried using prettify option.

Output-Prettify:

<tag1>
  <tag2 attr1="a1"> 
    example text 
  </tag2>
  <tag3>
    <tag4 attr2="a2"> 
      example text 
    </tag4>
    <tag5>
      <tag6 attr3="a3"> 
        example text 
      </tag6>
    </tag5>
   </tag3>
</tag1>

But this is not what I wanted. I want to maintain the exact indentation of the tags as in the input file.

like image 999
radha shankar Avatar asked Sep 16 '25 13:09

radha shankar


1 Answers

Unfortunately you cannot to it directly. Beautiful soup parses its input and keeps no trace of the original formatting.

So, if do do not modify the XML, you could first read it as a whole string in memory, then feed that string into BS to parse it and make your tests, and then use it to write back to the new file.

If you want to modify the XML and use a special formatting, you will have to navigate the BS tree and format it by hand.

like image 107
Serge Ballesta Avatar answered Sep 19 '25 02:09

Serge Ballesta