Python: 3.11 Saxonche: 12.4.2
My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:
import gc
from time import sleep
from saxonche import PySaxonProcessor
xml_str = """
<root>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""
while True:
print('Running once...')
with PySaxonProcessor(license=False) as proc:
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.
Other things I tried that aren't shown above:
parse_xml()
using the del
Python keyword. No change.I have to use Saxon instead of lxml because I need XPath 3.0 support.
What am I doing wrong? How do I parse XML using Saxon in a way that doesn't leak?
A few folks have suggested that instantiating the PySaxonProcessor once before the loop will fix the leak. It doesn't. This still leaks:
with PySaxonProcessor(license=False) as proc:
while True:
print('Running once...')
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
There's clearly a failure to properly clean up once the context manager terminates - i.e., PySaxonProcessor.__exit__ isn't doing what it (probably) should do.
You need to contact the developer(s) as this isn't a Python issue per se. You are not doing anything wrong.
The problem can be replicated as follows:
from saxonche import PySaxonProcessor
import psutil
count = 0
process = psutil.Process()
prev = process.memory_info().rss
for _ in range(100):
with PySaxonProcessor(license=False):
pass
if (count := count + 1) % 10 == 0:
m = process.memory_info().rss
print(f"{m - prev:,}")
prev = m
Platform:
macOS 14.4.1
Python 3.12.2
M2
Output:
2,228,224
2,244,608
2,260,992
2,244,608
2,228,224
2,244,608
2,244,608
2,228,224
2,228,224
It looks like a memory leak. I created a bug to track it: https://saxonica.plan.io/issues/6391
And the issue is now fixed in the released SaxonC 12.5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With