I am trying to address the following issue. Let's assume a dataframe (loaded from a txt file) with the following structure (and thousands of rows):
foo.head()
X Y Z 0 125417.5112 536361.8752 -1750.0 1 127517.7647 533925.8644 -1750.0 2 128144.1000 533199.4000 -1750.0 3 128578.8385 532904.9288 -1750.0 4 125417.5112 536361.8752 -1750.0 ....
The data represents X Y and Z coordinates.
I also have a set of points that define a closed polygon. These are in a numpy array:
polypoints
array([[ 125417.5112, 536361.8752],
[ 127517.7647, 533925.8644],
[ 128144.1 , 533199.4 ],
....
[ 125417.5112, 536361.8752]])
How can i filter my dataframe to drop the rows that do NOT fall inside the closed polygon?
I have tried defining the polygon using shapely.geometry polygon. By doing:
poly = Polygon(polypoints)
This works fine. But I am at a loss as to how to continue with this.
Help is much appreciated
----EDIT---- Please see below for updated solution
The original solution suggested by @MrT works great. However, looking at geopandas as suggested by @Rutger Kassies, i have also found another solution. First one needs to install the geopandas package. Then the following code works for me:
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon
# load the data that should be cropped by the polygon
# this assumes that the csv file already includes
# a geometry column with point data as performed below
dat_gpd = gpd.GeoDataFrame.from_csv(r'{}\data_to_crop.csv'.format(savedir), sep='\t')
# load the data of the polygon as a dataframe
arr_df = pd.DataFrame(data, columns=['X','Y','Z'])
# make shapely points out of the X and Y coordinates
point_data = [Point(xy) for xy in zip(arr_df.X, arr_df.Y)]
# assign shapely points as geometry to a geodataframe
# Like this you can also inspect the individual points if needed
arr_gpd = gpd.GeoDataFrame(arr_df, geometry=point_data)
# define a shapely polygon from X and Y coordinates of the shapely points
polygo = Polygon([[p.x, p.y] for p in arr_gpd.geometry])
# assing defined polygon to a new dataframe
pol_gpd= gpd.GeoDataFrame()
pol_gpd['geometry'] = None
pol_gpd.loc[0,'geometry'] = polygo
# define a new dataframe from the spatial join of the dataframe with the data to be cropped
# and the dataframe with the polygon data, using the within function.
dat_fin = gpd.sjoin(dat_gpd, pol_gpd, op = 'within')
Hope this helps if someone faces a similar problem. Also, further info on the spatial join can be found on the geopandas website. Note that this functionality does not require an operation between polygons, but also works with points and polygons
--EDIT --
%timeit gpd.sjoin(dat_gpd, pol_gpd, op = 'within')
31.8 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit dat_gpd['inpoly'] = dat_gpd.apply(lambda row: polygo.intersects(Point(row["X"], row["Y"])), axis = 1)
1min 26s ± 389 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
It seems that the geo-pandas function is much faster. Though to be fair the non-geo pandas solution also has to convert the X and Y to shapely point elements and then perform the intersection evaluation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With