Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice to release memory after url fetch on appengine (python)

my problem is how to best release memory the response of an asynchrones url fetch needs on appengine. Here is what I basically do in python:

rpcs = []

for event in event_list:
    url = 'http://someurl.com'
    rpc = urlfetch.create_rpc()
    rpc.callback = create_callback(rpc)
    urlfetch.make_fetch_call(rpc, url)
    rpcs.append(rpc)

for rpc in rpcs:
    rpc.wait()

In my test scenario it does that for 1500 request. But I need an architecture to handle even much more within a short amount of time.

Then there is a callback function, which adds a task to a queue to process the results:

def event_callback(rpc):
    result = rpc.get_result()
    data = json.loads(result.content)
    taskqueue.add(queue_name='name', url='url', params={'data': data})

My problem is, that I do so many concurrent RPC calls, that the memory of my instance crashes: "Exceeded soft private memory limit with 159.234 MB after servicing 975 requests total"

I already tried three things:

del result
del data

and

result = None
data = None

and I ran the garbage collector manually after the callback function.

gc.collect()

But nothing seem to release the memory directly after a callback functions has added the task to a queue - and therefore the instance crashes. Is there any other way to do it?

like image 365
Sebastian Küpers Avatar asked Nov 21 '25 19:11

Sebastian Küpers


1 Answers

Wrong approach: Put these urls into a (put)-queue, increase its rate to the desired value (defaut: 5/sec), and let each task handle one url-fetch (or a group hereof). Please note that theres a safety limit of 3000 url-fetch-api-calls / minute (and one url-fetch might use more than one api-call)

like image 176
T. Steinrücken Avatar answered Nov 23 '25 08:11

T. Steinrücken



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!