I'm wondering why I'm getting a delay when plotting survival curves in ggplot2. I'm getting a delay of factor 3 compared to plot (see example below)
However, if the ggplot-chart is saved only, this delay is small (factor 2 to 3 longer compared to plot).
I know very well that this is an old issue (> 9 y), I googled extensive discussions about it, but I still am unable to solve it or to localize the delay problem.
If the ggplot-chart has to be displayed onscreen the delay is round about factor 3 to 4 compared to plot.
Where should I search? Any idea what causes the delay?
n = 100000;
time = runif(n, min = 1, max = 10)
event = runif(n, min = 0, max = 1)
df = data.frame(time = time, event = event)
df$event = ifelse(df$event < .5, 0, 1);
head(df)
library(survival)
fit <- survfit(Surv(time, event) ~ 1, data = df)
st <- Sys.time()
plot(fit)
Sys.time() - st
#.... getting a process duration of 0.3439 secs
library(ggplot2)
x = data.frame(surv = fit$surv, time = fit$time, lower = fit$lower, upper = fit$upper)
st <- Sys.time()
g = ggplot(x, aes(y = surv, x = time)) + geom_step(size = .5)
ggsave(plot = g, file = '/tmp/test.png', dpi = 300, device = 'png');
Sys.time() - st
#.... getting a process duration of 0.9424 secs
st <- Sys.time()
ggplot(x, aes(y = surv, x = time)) + geom_step(size = .5) + scale_y_continuous(labels = function(x) sprintf('%.0f%%', x * 100))
Sys.time() - st
#.... getting a process duration of 1.268 secs
There are 4 steps that have to happen in a ggplot creation:
ggplot_built
object, which is like a full technical blueprint. This step applies all the necessary stats, transforms and mappings to convert data into graphical objects. Its speed is dependent on the amount of processing your data needs, so is dependent on the size of your data.ggplot_built
object into a table of graphical objects (a gtable) in the low-level grid
language, ready for drawing. This step depends on the number of objects in your plot, so may be faster if your data is summarized (eg a histogram or heatmap), compared to plotting all points (eg a dense dot plot or a raster image)grid
package. Its speed is somewhat dependent on the type of objects being drawn as well as how many of them there are.For your example, we can use your timing method each step takes
# 1) Create the ggplot object
st <- Sys.time()
p <- ggplot(x, aes(y = surv, x = time)) + geom_step(size = .5) +
scale_y_continuous(labels = function(x) sprintf('%.0f%%', x * 100))
Sys.time() - st
#> Time difference of 0.01622295 secs
# 2) Converting to a ggplot_built object
st <- Sys.time()
p <- ggplot_build(p)
Sys.time() - st
#> Time difference of 0.1523981 secs
# 3) Creating the gtable
st <- Sys.time()
p <- ggplot_gtable(p)
Sys.time() - st
#> Time difference of 1.841841 secs
# 4) Drawing the plot
st <- Sys.time()
grid::grid.draw(p)
Sys.time() - st
#> Time difference of 0.4114258 secs
We can see that most of the time here is taken up converting the ggplot_built
object into a gtable
. This will be true whether we are writing an image or drawing to screen. The final step of rendering the image is down to how efficient the graphics device is at converting grid objects to graphical output.
Created on 2022-08-24 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With