Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python xarray select by lat/long and extract point data to dataframe

I would like to select all grid cells within a lat/long range, and for each grid cell, export it as a dateframe and then to an csv file (i.e. df.to_csv). My dataset is below. I can use xr.where(...) to mask out grid cells outside my input, but not sure how to loop through remaining grids that were not masked out. Alternatively, I have tried using the xr.sel functions, but they do not seem to accept operators like ds.sel(gridlat_0>45). xr.sel_points(...) may also work, but I cannot figure out the correct syntax of indexers to use in my case. Thank you for your help in advance.

<xarray.Dataset>
Dimensions:    (time: 48, xgrid_0: 685, ygrid_0: 485)
Coordinates:
    gridlat_0  (ygrid_0, xgrid_0) float32 44.6896 44.6956 44.7015 44.7075 ...
  * ygrid_0    (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
  * xgrid_0    (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
  * time       (time) datetime64[ns] 2016-07-28T01:00:00 2016-07-28T02:00:00 ...
    gridlon_0  (ygrid_0, xgrid_0) float32 -129.906 -129.879 -129.851 ...
Data variables:
    u          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    gridrot_0  (time, ygrid_0, xgrid_0) float32 nan nan nan nan nan nan nan ...
    Qli        (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    Qsi        (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    p          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    rh         (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    press      (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    t          (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
    vw_dir     (time, ygrid_0, xgrid_0) float64 nan nan nan nan nan nan nan ...
like image 811
nicway Avatar asked Oct 16 '25 05:10

nicway


1 Answers

The simplest way to do this is probably to loop through every grid point, with something like the following:

# (optionally) create a grid dataset so we don't need to pull out all
# the data from the main dataset before looking at each point
grid = ds[['gridlat_0', 'gridlon_0']]

for i in range(ds.coords['xgrid_0'].size):
    for j in range(ds.coords['ygrid_0'].size):
        sub_grid = grid.isel(xgrid_0=i, ygrid_0=j)
        if is_valid(sub_grid.gridlat_0, sub_grid.gridlon_0):
            sub_ds = ds.isel(xgrid_0=i, ygrid_0=j)
            sub_ds.to_dataframe().to_csv(...)

Even with a 685x485, this should only take a few seconds to loop through every point.

Pre-filtering with ds = ds.where(..., drop=True) (available in the next xarray release, due out later this week) before hand could make this significantly faster, but you'll still have the issue of possibly not being able to represent the selected grid on orthogonal axes.

A final option, probably the cleanest, is to use stack to convert the dataset into 2D. Then you can use standard selection and groupby operations along the new 'space' dimension:

ds_stacked = ds.stack(space=['xgrid_0', 'ygrid_0'])
ds_filtered = ds_stacked.sel(space=(ds_stacked.gridlat_0 > 45))
for _, ds_one_place in ds_filtered.groupby('space'):
    ds_one_place.to_dataframe().to_csv(...)
like image 195
shoyer Avatar answered Oct 17 '25 18:10

shoyer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!