Similarity of two sets of intervals

Question

What sort of algorithm/solution could be used to indicate the similarity (overlap/precision/recall/...) of two sets of ranges.

I can think of (or find online) hundreds of similar problems but never exact, but surely this "wheel" must have been invented already...

Lets say that the input data is something like:

Real      [ ## ###  #     ] or [(1,2),(4,6),(9,10)]  
Predicted [ ## #          ] or [(1,2),(4,4)]

Output should be ~50 %

Should I for example AND bitmaps, use interval trees or what? Is there a nice functional or simple to write algorithm? Any meaningful measure of similarity will do, and so will any reasonable input format.

Thank you.

(realistic length ~4000 with < 50 intervals in each set)

Alex Libov · Accepted Answer

You could break the segments to individual points, and tag each point as real/predicted and start/end.

Then sort the points, go over the sorted list and keep track of overlaps.

You don't even need to track whether the interval was originally from Real or Predicted - you just need to keep track whether there's one or two intervals at each point.

Example:

Real      [(1,2),(4,6),(9,10)]  
Predicted [(1,2),(4,4)]

Broken down into points and sorted (S for Start, E for End):

[(1,S),(1,S),(2,E),(2,E),(4,S),(4,S),(4,E),(6,E),(9,S),(10,E)]

Then going through the array - keep track of how many segments "are open" and count the length of total open and 2 segments open.

The result is 2 segments open/total open.

Similarity of two sets of intervals

Tags:

algorithm

precision

intervals

interval-intersection

1 Answers

Alex Libov

Recent Activity

Donate For Us

Similarity of two sets of intervals

Tags:

algorithm

precision

intervals

interval-intersection

1 Answers

Alex Libov

Related questions

Recent Activity

Donate For Us