Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between overlap and intersect methods in Pyranges

Pyranges class from similarly named package has two methods with slightly different functionality: intersect and overlap. Intersect method description is quite similar to overlap's one: Return overlapping subintervals. vs Return overlapping intervals. I can't quite glimpse the difference between those two (Yeah, I noticed that sub prefix).

Is overlap intended to reveal full intervals that do overlap at least at one position?

like image 374
AlexShein Avatar asked Oct 16 '25 15:10

AlexShein


1 Answers

Setup:

>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
|   Chromosome |     Start |       End | ID         |
|   (category) |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
|         chr1 |         1 |         3 | a          |
|         chr1 |         4 |         9 | b          |
|         chr1 |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int32) |   (int32) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

With overlap, you get back the intervals in self that overlapped those in other. If an interval overlapped more than once, it is still only returned once (by default):

>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

With intersect the returned intervals are the intersection of the overlapping intervals in self and other. All overlaps are returned by default:

>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         2 |         3 | a          |
| chr1         |         2 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

See the docs for more info:

  • intersect
  • overlap
like image 76
Chris F. Avatar answered Oct 18 '25 14:10

Chris F.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!