I am trying to write suffix tree class with a naive building algorithm which is O(m^2) and not Ukkonen.
My doubt is regarding how to represent the tree. So far, I have got this. But I do not think that is the write class structure for nodes and edges. Any suggestion regarding how the mapping/relationship between nodes and edges should be maintained. Important point is that in edge we just store start and end index to save space.
class suffixTree{
Node[] nodes;
Edge[] edge;
}
class Node{
int node_id;
Edge[] outgoingEdges;
}
class Edge{
int start_index;
int end_index;
int start_node_id;
int end_node_id;
}
I would do it this way:
class SuffixTree {
// A tree only needs to know about the root Node.
Node root;
}
// A Node contains a mapping from the first character to
// an Edge that corresponds to it.
// It doesn't need to know about anything else.
class Node {
Map<Character, Edge> edgeMap;
}
// An Edge contains only the start and the end index of the substring
// and the destination Node.
class Edge {
int startIndex;
int endIndex;
Node destination;
}
The most important changes are:
Getting rid of redundant information in all three classes.
Using references instead of arrays and indices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With