Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if a point is in a polygon effectively using R for large data set?

Tags:

r

polygon

ggplot2

I am new to R and for my currently project, I have to draw a heat map related to a specific event. There are around 2 million observations of such event and in each observation there is a long and lat coordinate. Also, I have converted the map data to a data frame and the data frame contains 71 district, each district is defined with a set of coordinates. I need to decide which observation of the event belongs to which district. I am using the following code:

for (row in 1:nrow(data2015)){
  point.x=data2015[row,"Latitude"]
  point.y=data2015[row,"Longitude"]
  for (name in names(polygonOfdis)){
    if (point.in.polygon(point.x, point.y, polygonOfdis[[name]]$lat,   polygonOfdis[[name]]$long, mode.checked=FALSE)){
    count[[name]]<-count[[name]]+1
    break
}
}
}

data2015 is the data set for the event, polygonOfdis is the data set for each district.

For small data set, this algorithm works okay but for my data set, it will definitely run more than ten hours or even more (For a data set only 1/400 of current size, this algorithm runs for 1 to 2 minutes). I am wondering if there is any better way to find out which observation belongs to which district? My problem is that the point.in.polygon function takes too much time and I am wondering if there is any other function can do this?

PS: The current data is actual only 1/10 of the real data I have to process, so I really really need a faster way to do this.

like image 839
Optimus Prime Avatar asked Oct 15 '25 14:10

Optimus Prime


2 Answers

So, awhile ago, I ported over a point in a polygon algorithm by W. Randolph Franklin that uses the notion of rays. I.e. If a point is in the polygon, it should pass through an odd number of times. Otherwise, when it has an even number, it should lie on the outside of the polygon.

The code is considerably fast because it is written using Rcpp. It is split into two parts: 1. The PIP Algorithm and 2. A wrapper function for classification.

PIP Algorithm

#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]

//' @param points A \code{rowvec} with x,y  coordinate structure.
//' @param bp     A \code{matrix} containing the boundary points of the polygon. 
//' @return A \code{bool} indicating whether the point is in the polygon (TRUE) or not (FALSE)
// [[Rcpp::export]]
bool pnpoly(const arma::rowvec& point, const arma::mat& bp) {
    // Implementation of the ray-casting algorithm is based on
    // 
    unsigned int i, j;
    
    double x = point(0), y = point(1);
    
    bool inside = false;
    for (i = 0, j = bp.n_rows - 1; i < bp.n_rows; j = i++) {
      double xi = bp(i,0), yi = bp(i,1);
      double xj = bp(j,0), yj = bp(j,1);
      
      // See if point is inside polygon
      inside ^= (((yi >= y) != (yj >= y)) && (x <= (xj - xi) * (y - yi) / (yj - yi) + xi));
    }
    
    // Is the cat alive or dead?
    return inside;
}

Classification Algorithm

//' PIP Classifier
//' @param points A \code{matrix} with x,y  coordinate structure.
//' @param names  A \code{vector} of type \code{string} that contains the location name.
//' @param bps    A \code{field} of type {matrix} that contains the polygon coordinates to test against.
//' @return A \code{vector} of type \code{string} with location information.
// [[Rcpp::export]]
std::vector<std::string> classify_points(const arma::mat& points, 
                                         std::vector<std::string> names,
                                         const arma::field<arma::mat>& bps){
  unsigned int i, j;
  
  unsigned int num_points = points.n_rows;
  
  std::vector<std::string> classified(num_points);
  
  for(i = 0; i < num_points; i++){
    
    arma::rowvec active_row = points.row(i);
    
    // One of the coordinate lacks a value
    if( !arma::is_finite(active_row(0)) || !arma::is_finite(active_row(1)) ){
      classified[i] = "Missing";
      continue; // skip trying to find a location
    }

    // Try to classify coordinate based on supplied boundary points for area j
    for(j = 0; j < names.size(); j++){
      if( pnpoly(active_row, bps(j)) ){
        classified[i] = names[j];
        break; // Break loop
      }
    }
    
  }
  
  return classified;
}
like image 189
coatless Avatar answered Oct 18 '25 07:10

coatless


This function from the SMDTools package worked well.

like image 39
Conner M. Avatar answered Oct 18 '25 07:10

Conner M.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!