Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize the memory and time usage of the following algorithm in python

I am trying to accomplish the following logical operation in Python but getting into memory and time issues. Since, I am very new to python, guidance on how and where to optimize the problem would be appreciated ! ( I do understand that the following question is somewhat abstract )

import networkx as nx 
    dic_score = {}
    G = nx.watts_strogatz_graph(10000,10,.01) # Generate 2 graphs with 10,000 nodes using Networkx
    H = nx.watts_strogatz_graph(10000,10,.01)
    for Gnodes in G.nodes()
        for Hnodes in H.nodes ()  # i.e. For all the pair of nodes in both the graphs
           score = SomeOperation on (Gnodes,Hnodes)  # Calculate a metric 
           dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ]) # Store the metric in the form a Key: value, where value become a list of lists, pair in a dictionary

Then Sort the lists in the generated dictionary according to the criterion mentioned here sorting_criterion

My problems/questions are:

1) Is there a better way of approaching this than using the for loops for iteration?

2) What should be the most optimized (fastest) method of approaching the above mentioned problem ? Should I consider using another data structure than a dictionary ? or possibly file operations ?

3) Since I need to sort the lists inside this dictionary, which has 10,000 keys each corresponding to a list of 10,000 values, memory requirements become huge quite quickly and I run out of it.

3) Is there a way to integrate the sorting process within the calculation of dictionary itself i.e. avoid doing a separate loop to sort?

Any inputs would be appreciated ! Thanks !

like image 378
R.Bahl Avatar asked Nov 29 '25 04:11

R.Bahl


1 Answers

1) You can use one of functions from itertools module for that. Let me just mention it, you can read the manual or call:

from itertools import product
help(product)

Here's an example:

for item1, item2 in product(list1, list2):
    pass

2) If the result is too big to fit in memory, try saving them somewhere. You can output it into a CSV file for example:

with open('result.csv') as outfile:
   writer = csv.writer(outfile, dialect='excel')
   for ...
       writer.write(...)

This will free your memory.

3) I think it's better to sort the result data afterwards (because sort function is rather quick) rather than complicate the matters and sort the data on the fly.

You could instead use NumPy arroy/matrix operations (sums, products, or even map a function to each matrix row). These are so fast that sometimes filtering the data costs more than calculating everything.

If your app is still very slow, try profiling it to see exactly what operation is slow or is done too many times:

from cProfile import Profile
p = Profile()

p.runctx('my_function(args)', {'my_function': my_function, 'args': my_data}, {})
p.print_stats()

You'll see the table:

      2706 function calls (2004 primitive calls) in 4.504 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.006    0.003    0.953    0.477 pobject.py:75(save_objects)
  43/3    0.533    0.012    0.749    0.250 pobject.py:99(evaluate)
...
like image 147
culebrón Avatar answered Dec 01 '25 18:12

culebrón



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!