I have a million points and a large shape file—8GB—which is too big for to load into memory in R on my system.  The shape file is single-layer so a given x, y will hit at most one polygon - as long as it's not exactly on a boundary!  Each polygon is labelled with a severity - e.g. 1, 2, 3.  I'm using R on a 64-bit ubuntu machine with 12GB ram.
What's the simplest way to be able to "tag" the data frame to the polygon severity so that I get a data.frame with an extra column, i.e. x ,y, severity?
Just because all you have is a hammer, doesn't mean every problem is a nail.
Load your data into PostGIS, build a spatial index for your polygons, and do a single SQL spatial overlay. Export results back to R.
By the way, saying the shapefile is 8Gb is not a very useful piece of information. Shapefiles are made from at least three files, the .shp which is the geometry, the .dbf which is the database, and the .shx which connects the two. If your .dbf is 8Gb then you can easily read the shapes themselves in by replacing it with a different .dbf. Even if the .shp is 8Gb it might only be three polygons, in which case it might be easy to simplify them. How many polygons have you got, and how big is the .shp part of the shapefile?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With