What is the quickest way of cropping (clipping) a complex SpatialPolygonsDataFrame with the @data preserved in R using another, possibly complex, SpatialPolygon? I know of two ways (shown under). The raster way is quicker for less complex SpatialPolygonsDataFrames and returns a SpatialPolygonsDataFrame as in the example. It gets slow with large SpatialPolygonsDataFrames, though. The rgeos way is quicker for large SpatialPolygonsDataFrame, but it sometimes fails with very complex geometries and does not return SpatialPolygonsDataFrames as default.
I have not paid attention to the GIS development in R lately and there might well be quicker ways of doing this by now.
The example polygons are small to honor people's computers and bandwidth. Consider the "real" polygons 50-1000 MB.
Setup:
library(rnaturalearth)
library(sp)
library(raster)
library(rgeos)
dt <- rnaturalearth::ne_countries()
clip_boundary <- sp::SpatialPolygons(list(sp::Polygons(
list(sp::Polygon(data.frame(lon = c(0, 180, 180, 0), lat = c(40, 40, 80, 80)))), ID = 1)))
The rgeos way:
system.time({
clipped.dt <- rgeos::gIntersection(dt, clip_boundary, byid = TRUE)
ids <- sapply(slot(clipped.dt, "polygons"), function(x) slot(x, "ID"))
ids <- sapply(strsplit(ids, " "), "[", 1)
tmp.df <- data.frame(dt@data[ids,])
names(tmp.df) <- names(dt@data)
out <- sp::SpatialPolygonsDataFrame(clipped.dt, tmp.df, match.ID = FALSE)
})
# user system elapsed
# 0.069 0.002 0.074
class(out)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"
The raster way:
system.time({
out <- raster::crop(dt, clip_boundary)
})
# user system elapsed
# 0.042 0.001 0.043
class(out)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"
A plot for a reference (not relevant for the question):
plot(out)

Much of R's geospatial work can now be done using the sf package.
https://r-spatial.github.io/sf/index.html
It looks like it might be a touch faster than raster and sp.
library(rnaturalearth)
library(sp)
#library(raster)
#library(rgeos)
library(sf)
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
library(ggplot2)
# Original data
dt <- rnaturalearth::ne_countries()
clip_boundary <- sp::SpatialPolygons(list(sp::Polygons(
list(sp::Polygon(data.frame(lon = c(0, 180, 180, 0), lat = c(40, 40, 80, 80)))), ID = 1)))
# Original data as sf objects
dt_sf <- st_as_sf(dt)
boundary_sf <- st_as_sf(clip_boundary)
# clip
clipped <- st_crop(dt_sf, boundary_sf)
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
#plot
ggplot(clipped) + geom_sf()

system.time({
out <- st_crop(dt_sf, boundary_sf)
})
#> although coordinates are longitude/latitude, st_intersection assumes that they are planar
#> Warning: attribute variables are assumed to be spatially constant throughout all
#> geometries
#> user system elapsed
#> 0.019 0.000 0.018
system.time({
out <- raster::crop(dt, clip_boundary)
})
#> Warning in raster::crop(dt, clip_boundary): non identical CRS
#> user system elapsed
#> 0.526 0.000 0.525
system.time({
clipped.dt <- rgeos::gIntersection(dt, clip_boundary, byid = TRUE)
ids <- sapply(slot(clipped.dt, "polygons"), function(x) slot(x, "ID"))
ids <- sapply(strsplit(ids, " "), "[", 1)
tmp.df <- data.frame(dt@data[ids,])
names(tmp.df) <- names(dt@data)
out <- sp::SpatialPolygonsDataFrame(clipped.dt, tmp.df, match.ID = FALSE)
})
#> Warning in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td,
#> unaryUnion_if_byid_false, : spgeom1 and spgeom2 have different proj4 strings
#> user system elapsed
#> 0.045 0.000 0.045
Created on 2020-04-13 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With