Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 z clipping: remove unnecessary points in overlapping stacks

Tags:

r

ggplot2

Consider the following example of plotting 100 overlapping points:

ggplot(data.frame(x=rnorm(100), y=rnorm(100)), aes(x=x, y=y)) +
    geom_point(size=100) +
    xlim(-10, 10) +
    ylim(-10, 10)

enter image description here

I now want to save the image as vector graphics, e.g. in PDF. This is not a problem with the above example, but once I've got over a million points (e.g. from a volcano plot), the file size can exceed 100 MB for one page and it takes ages to display or edit.

In the above example the same shape could could still be represented by either

  • converting the points to a shape outline, or
  • keeping a couple of points and discarding the rest.

Is there any way (or preferably tool that already does this) to remove points from a plot that will never be visible? (ideally supporting transparency)

The best approach I have heard so far is to round the position of the dots and remove grid points that have > N points, then use the original positions of the remaining ones. Is there anything better?

Note that this should work with an arbitrary structure of points, and only remove those that are not visible.

like image 678
Michael Schubert Avatar asked Jan 31 '26 11:01

Michael Schubert


1 Answers

You could do something with the convex hull, like this, filling in the polygon that makes up the convex hull:

library(ggplot2)
set.seed(123)

df <- data.frame(x = rnorm(100), y = rnorm(100))
idx <- chull(df)
ggplot(df, aes(x = x, y = y)) +
    geom_point(size = 100,color="darkgrey") +
    geom_polygon(data=df[idx,],color="blue") +
    geom_point(size = 1, color = "red", size = 2) +
    xlim(-10, 10) +
    ylim(-10, 10)

yielding:

enter image description here

(Note that I pulled this chull-idea out of Hadley's "Extending ggplot2" guide https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html.)

In your case you would drop the geom_point calls and set transparency on the geom_polygon. Also not sure how fast chull is for millions of points, though it will clearly be faster than plotting them all.

And I am not really sure what you are after. If you really want the 100 pixel radius, they you could probably just do it for the ones on the complex hull, plus fill in the middle with geom_polygon.

So using this code:

ggplot(df[idx,], aes(x = x, y = y)) +
    geom_point(size = 100, color = "black") +
    geom_polygon(fill = "black") +
    xlim(-10, 10) +
    ylim(-10, 10)

to make this: enter image description here

like image 73
Mike Wise Avatar answered Feb 03 '26 04:02

Mike Wise