Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can this breadth-first search be made faster?

I have a data set which is a large unweighted cyclic graph The cycles occur in loops of about 5-6 paths. It consists of about 8000 nodes and each node has from 1-6 (usually about 4-5) connections. I'm doing single pair shortest path calculations and have implemented the following code to do a breadth-first search.

from Queue import Queue

q = Queue()
parent = {}
fromNode = 'E1123'
toNode = 'A3455'

# path finding
q.put(fromNode)
parent[fromNode] = 'Root'

while not q.empty():
  # get the next node and add its neighbours to queue
  current = q.get()
  for i in getNeighbours(current):
    # note parent and only continue if not already visited
    if i[0] not in parent:
      parent[i[0]] = current
      q.put(i[0])

  # check if destination
  if current == toNode:
    print 'arrived at', toNode
    break

The above code uses the Python 2.6 Queue module and getNeighbours() is simply a subroutine that makes a single MySQL call and returns the neighbours as a list of tuples e.g. (('foo',),('bar',)). The SQL call is quick.

The code works ok however testing to down to depths of about 7 layers takes about 20 seconds to run (2.5GHz Intel 4GB RAM OS X 10.6)

I'd welcome any comments about how to improve the run time of this code.

like image 406
timbo Avatar asked Nov 18 '09 02:11

timbo


2 Answers

Well, given the upvotes on the comment, I'll make it an answer now.

The SQL in the tight loop is definitely slowing you down. I don't care how fast the call is. Think about it -- you're asking for a query to be parsed, a lookup to be run -- as fast as that is, it's still in a tight loop. What does your data set look like? Can you just SELECT the entire data set into memory, or at least work with it outside of MySQL?

If you work with that data in memory, you will see a significant performance gain.

like image 63
Jed Smith Avatar answered Nov 08 '22 00:11

Jed Smith


Something like this:

#!/usr/bin/env python

from Queue import Queue

def traverse_path(fromNode, toNode, nodes):
    def getNeighbours(current, nodes):
        return nodes[current] if current in nodes else []

    def make_path(toNode, graph):
        result = []
        while 'Root' != toNode:
            result.append(toNode)
            toNode = graph[toNode]
        result.reverse()
        return result

    q = Queue()
    q.put(fromNode)
    graph = {fromNode: 'Root'}

    while not q.empty():
        # get the next node and add its neighbours to queue
        current = q.get()
        for neighbor in getNeighbours(current, nodes):
            # use neighbor only continue if not already visited
            if neighbor not in graph:
                graph[neighbor] = current
                q.put(neighbor)

        # check if destination
        if current == toNode:
            return make_path(toNode, graph)
    return []

if __name__ == '__main__':
    nodes = {
        'E1123': ['D111', 'D222', 'D333', 'D444'],
        'D111': ['C01', 'C02', 'C04'],
        'D222': ['C11', 'C03', 'C05'],
        'D333': ['C01'],
        'C02': ['B1'],
        'B1': ['A3455']
    }
    result = traverse_path('E1123', 'A3455', nodes)
    print result

['E1123', 'D111', 'C02', 'B1', 'A3455']

If you replace your SQL queries with a dictionary of lists (and that would be the tricky part), you will get this performance.

like image 32
hughdbrown Avatar answered Nov 07 '22 23:11

hughdbrown



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!