I have the following tree data structure stored in dictionaries:
1
2
3
4 -> ["a", "b", "c"]
5 -> ["x", "y", "z"]
3
5
7 -> ["e", "f", "j"]
Here is how I build sample of it in Python:
tree = dict()
for i in range(100):
tree[i] = dict()
for j in range(10):
tree[i][j] = dict()
for k in range(10):
tree[i][j][k] = dict()
for l in range(10):
tree[i][j][k][l] = dict()
for m in range(10):
tree[i][j][k][l][m] = dict()
for n in range(10):
tree[i][j][k][l][m][n] = ["a", "b", "c", "d", "e", "f", "g"]
I want to traverse it and do some calculation when reaching each leaf. While doing calculation I need to know the path to the leaf.
I.e. given callback
def callback(p1, p2, p3, p4, leaf):
...
I want it to be called like following using my tree example:
callback(1, 2, 3, 4, ["a", "b", "c"])
callback(1, 2, 3, 5, ["x", "y", "z"])
callback(1, 3, 5, 7, ["e", "f", "j"])
Question: How to implement traversal most efficiently? Note, that tree depth is not static.
Here is what I tried:
1. Inline code. This is the fastest one, but is not usable in practice since, again, tree depth is not static.
def callback(*args):
assert isinstance(args[-1], list)
start = time.time()
for k1, leafs1 in tree.items():
for k2, leafs2 in leafs1.items():
for k3, leafs3 in leafs2.items():
for k4, leafs4 in leafs3.items():
for k5, leafs5 in leafs4.items():
for k6, val in leafs5.items():
callback(k1, k2, k3, k4, k5, k6, val)
print("inline: %f" % (time.time() - start))
This runs 3.5 seconds average using Python 3.4.2 on my laptop.
2. Recursive approach
from functools import partial
def iterate_tree(tree, depth, callback):
if depth:
for k, subtree in tree.items():
cb = partial(callback, k)
yield from iterate_tree(subtree, depth-1, cb)
else:
for k, v in tree.items():
rv = callback(k, v)
yield rv
start = time.time()
for i in iterate_tree(tree, 5, callback):
pass
print("iterate_tree: %f" % (time.time() - start))
This is generic and all that nice, but 2 times slower!
3. Non-recursive approach I thought that may be recursion, yield from
and partial
are slowing me down. So I tried the to flaten it:
def iterate_tree2(tree, depth, callback):
iterators = [iter(tree.items())]
args = []
while iterators:
try:
k, v = next(iterators[-1])
except StopIteration:
depth += 1
iterators.pop()
if args:
args.pop()
continue
if depth:
args.append(k)
iterators.append(iter(v.items()))
depth -= 1
else:
yield callback(*(args + [k, v]))
start = time.time()
for i in iterate_tree2(tree, 5, callback):
pass
print("iterate_tree2: %f" % (time.time() - start))
This is generics and works, but performance improvement compared to recursion, i.e. still two times slower than inline version.
So how to implement my traversal in a generic way? And what makes inline version so much faster?
P.S. The code above is for Python 3.3+. I've adapted it to Python 2 and results are similar.
SOLUTION AND ANALYSIS
I've made comparative analysis of all of the solutions and optimizations. The code and results can be obtained from the gist.
TL;DR; The fastest solution is to use optimized loop-based version:
Loop-based iterations own everything when run on PyPy.
On non-pypy, the major slowdown is a result reporting from callback:
yield
ing results is the slowest - ~30% penalty compared to inline. See iterate_tree6
for loop version and iterate_tree3
for recursive versioniterate_tree3_noyield
iterate_tree6_nofeedback
For recursion-based versions, use tuples for argument accumulating and not list. The performance difference is rather significant.
Thanks to everyone who contributed to this topic.
Here's an optimized version of the iterative iterate_tree2
. It is 40 % faster on my system, mainly thanks to improved looping structure and elimination of the try except
. Andrew Magee's recursive code performs approximately the same.
def iterate_tree4(tree, depth, callback):
iterators = [iter(tree.items())]
args = []
while iterators:
while depth:
for k, v in iterators[-1]:
args.append(k)
iterators.append(iter(v.items()))
depth -= 1
break
else:
break
else:
for k, v in iterators[-1]:
yield callback(*(args + [k, v]))
depth += 1
del iterators[-1]
del args[-1:]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With