Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to index and query paths in a graph

I have a table representing a graph: Edges(from, to).

I'd like to query this table with "path queries", retrieving only the source and destination of the path.

For example, assume my table consists of the following rows:

+------+----+
| from | to |
+------+----+
| a    | b  |
| b    | c  |
| c    | d  |
| f    | g  |
| b    | f  |
| c    | a  |
+------+----+ 

Assume I execute the following (pseudo-)query:

SELECT ?v1, ?v2 WHERE ?v1 to ?t1, ?t1 to ?t2, ?t2 to ?v2;

This means I want all the pairs of source and destination that exist in all paths consisting of 4 nodes. Executing this query should return the following results:

+-----+-----+
| ?v1 | ?v2 |
+-----+-----+
| a   | a   |
| a   | g   |
| a   | d   |
+-----+-----+

Of course, paths consisting of a different number of nodes might be needed as well, the number 4 isn't hardcoded :-)

My questions are:

  1. What's the best way to build such an SQL query (note that I'm using SQLite, so recursive queries can't be used).
  2. I currently have one index for the from column and one for the to column. Is this optimal? Should I create an index for the "from, to pair as well? Instead?

Assumptions

  1. There are no self-edges (E.G "a - a").

  2. There are no two identical rows.

Thanks in advance!

like image 482
Joe Avatar asked Nov 17 '25 22:11

Joe


1 Answers

ad 1.) unless you know in advance that your paths will always be of a given length (or a small set of given lengths), you cannot express your query in pure sql. however, you may choose to incrementally maintain the transitive closure of your graph, in particular if

  • changes to your graph are infrequent
  • and/or are mostly edge insertions (instead of deletions)
  • or mostly occur as bulk changes at times that allow for some preprocessing.

the technique is outlined in a paper by dong et al., doi://10.1.1.103.3314; don't be daunted by the theory and the math, they also provide sql code ready-to-use and their basic ideas are straightforward.

ad 2.)

if maintaining a transitive closure table is an option for you it wpould lend to one index on the pair of columns representing start and end vertices of paths.

if it isn't you might be able exploit the structure of your graph: for average fan-outs that are high (small) in comparison to fan-ins you should be better of with an index on the 'to' ('from') column.

if you cannot make an assumption on fan-out/fan-in ratios you're probably best off with an index on each column.

hope it helps,

best regards, carsten

like image 199
collapsar Avatar answered Nov 20 '25 12:11

collapsar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!