Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xarray groupby according to multi-indexs

The xarray supplies the groupby function which we can use to calculate the anomaly of the climate data. For example, the anomaly of the monthly weather data can be calculated accroding to http://xarray.pydata.org/en/stable/examples/weather-data.html:

climatology = ds.groupby("time.month").mean("time")
anomalies = ds.groupby("time.month") - climatology

However, when we want to calculate the anomaly of the daily data, we need to consider the Feb 29th in the leap year. If we use the grammar mentioned above, the example is given below:

import pandas as pd
import numpy as np
date = pd.date_range('20110101','20161231',freq='D')
data = np.random.rand(len(date))
da = xr.DataArray(data,dims=['date'],coords=dict(date=date))
da_group = da.groupby('date.dayofyear')

This method divides the DataArray according to the dayofyear in the date. But how do we do when we want to groupby according to the 'month' and the 'day' of the date, for example, Month=4 and Day=14 of every year (It's worth to mention that the dayofyears of 2011-04-10 and 2012-04-10 are different).

I have tried da_group = da.groupby(['date.month','date.day']), but it seems wrong with the error `group` must be an xarray.DataArray or the name of an xarray variable or dimension.Received ['date.month', 'date.day'] instead..

So how do we groupby according to both the month and the day of the date? Mang thanks!

like image 977
Yongwu Xiu Avatar asked Sep 06 '25 03:09

Yongwu Xiu


1 Answers

You can create a grouper array from a pandas MultiIndex:

In [9]: grouper = xr.DataArray(
   ...:     pd.MultiIndex.from_arrays(
   ...:         [da.date.dt.month.values, da.date.dt.day.values],
   ...:         names=['month', 'day'],
   ...:     ), dims=['date'], coords=[da.date],
   ...: )

In [10]: grouper
Out[10]:
<xarray.DataArray (date: 2192)>
array([(1, 1), (1, 2), (1, 3), ..., (12, 29), (12, 30), (12, 31)], dtype=object)
Coordinates:
  * date     (date) datetime64[ns] 2011-01-01 2011-01-02 ... 2016-12-31

You can then use this to group your data

In [11]: da.groupby(grouper)

See this related (but slightly different) question on grouping on multiple coordinates along a single dimension

Note that once you aggregate, xarray does not track the names of the grouped MultiIndex dimensions, so you'll end up with unnamed dims:

In [12]: da.groupby(grouper).mean()
Out[12]:
<xarray.DataArray (group: 366)>
array([0.7243612 , 0.5613106 , 0.59413407, 0.57179211, 0.68318279,
       0.49471343, 0.58264707, 0.56764063, 0.77111539, 0.57064475,
...
       0.45514646, 0.37333521, 0.49833203, 0.53370068, 0.54690462,
       0.69037877])
Coordinates:
  * group          (group) MultiIndex
  - group_level_0  (group) int64 1 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12 12 2
  - group_level_1  (group) int64 1 2 3 4 5 6 7 8 9 ... 25 26 27 28 29 30 31 29

You'll need to then rename your coordinates. Note, however, that the leap year date (2, 29) does appear in the results.

like image 170
Michael Delgado Avatar answered Sep 07 '25 19:09

Michael Delgado