I am writing software to characterize chemical structures based on their graph topology, and the chemical elements of each atom. They need to be uniquely sorted in order to assess whether two bond graphs are equal. I represent a graph by an array of values, and an array of pairs of indices into the first array, e.g.
vertices = {"C", "H", "H", "H", "H"};
edges = {{0, 1}, {0, 2}, {0, 3}, {0, 4}};
A graph is sorted first w.r.t. vertex values (position in the periodic table), and second w.r.t. the edges. Each edge is represented with the lower valued index first, and the edges are ordered lexicographically based on 1) the left index and 2) the right index. The ordered form of the methane molecule above would thus be:
vertices = {"H", "H", "H", "H", "C"};
edges = {{0, 4}, {1, 4}, {2, 4}, {3, 4}};
It is sorted because swapping two vertices, relabeling the edges referring to them and resorting the edges, would either leave the representation unchanged or make it less ordered. Following this convention it is easy enough to less-than-compare two orderings of a graph to assess which is better, but coming up with an efficient sorting algorithm is subtler than it seems because each vertex swaps leads to a relabeling of the edges, followed by possible swapping of the first and second element in each relabeled edge and a resorting of the edge list. I have hitherto used the brute force approach of wading through all permutations of equal-valued vertices, but since this scales with the factorial of the number of types of vertices it hits a computational brick wall already for modest number of the same type of vertex.
It seems to me that someone must have solved this problem, but all I can find when looking for a solution is topological sort of DAGs, which is a completely different problem, and very specialized organic chemistry-based solutions which are too heavy on domain-specific knowledge that seems unnecessary and unproductive for my purposes.
With some more research I have found that the problem is known as graph canonization and is closely related to the graph isomorphism problem, which seems to be either NP or NP complete, so unfortunately not as simple as I had hoped.
However, there exist a family of pragmatically successful algorithms, of which this recent paper is the best summary of current knowledge that I have found. The authors of that paper have also published an open source modern C++ library allowing solution of similar problems in a modular and extensible fashion: https://github.com/jakobandersen/graph_canon.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With