I have a 2m temperature netcdf file from ERA5 that goes from 2000 to 2019 for the months 04 to 10, giving a total of 13680 timesteps and a 61x161 lat-lon dimension. I want to do a monthly mean of all the daily timesteps for each year separately. For example, we would have monthly mean of the data in April of 2000, in May of 2000 and so forth. I've tried the following code with xarray resample, but two problems occur.
Here's what I’m talking about:
import xarray as xr
ds = xr.open_dataset(netcdf)
monthly_data=ds.resample(time='1M').mean()
We can look at the timestamp which shows monthly timestep, including non-related months.
print(np.array(monthly_data.time))
array(['2000-04-30T00:00:00.000000000', '2000-05-31T00:00:00.000000000',
'2000-06-30T00:00:00.000000000', '2000-07-31T00:00:00.000000000',
'2000-08-31T00:00:00.000000000', '2000-09-30T00:00:00.000000000',
'2000-10-31T00:00:00.000000000', '2000-11-30T00:00:00.000000000',
'2000-12-31T00:00:00.000000000', '2001-01-31T00:00:00.000000000',
To verify the content of the temperature, I turned the data into a dataframe.
temp_ar = np.array(monthly_data.t2m)
print(pd.DataFrame(temp_ar[0,:,:]).head())
0 1 2 ... 158 159 160
0 270.940613 270.911652 270.926727 ... NaN NaN NaN
1 271.294952 271.256744 271.250946 ... 272.948608 272.974731 272.998535
2 271.416779 271.457214 271.483459 ... 273.123169 273.079285 273.058563
3 271.848755 271.791382 271.784058 ... NaN 273.264038 NaN
4 272.226837 272.144928 272.123016 ... NaN NaN NaN
print(pd.DataFrame(temp_ar[1,:,:]).head())
0 1 2 3 4 5 6 ... 154 155 156 157 158 159 160
0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN
The 2nd array (which corresponds to the month 05 of 2000) shouldn't have nans, but it does and its like this for all the other timesteps (except for the last one for some reason). Would anybody know why this is happening?
Here is the original dataset
print(ds)
<xarray.Dataset>
Dimensions: (latitude: 61, longitude: 161, time: 13680)
Coordinates:
* longitude (longitude) float32 -80.0 -79.9 -79.8 -79.7 ... -64.2 -64.1 -64.0
* latitude (latitude) float32 50.0 49.9 49.8 49.7 ... 44.3 44.2 44.1 44.0
* time (time) datetime64[ns] 2000-04-01 ... 2018-10-30T23:00:00
Data variables:
t2m (time, latitude, longitude) float32 ...
Attributes:
Conventions: CF-1.6
history: 2020-12-07 03:50:31 GMT by grib_to_netcdf-2.16.0: /opt/ecmw...
Any help would be. Maybe I should try some other method? Cheers!
I think any easy way would be to use the groupby
method
Example:
da = xr.DataArray(
np.linspace(0, 1673, num=1674),
coords=[pd.date_range("1/1/2000", "31/07/2004", freq="D")],
dims="time",
)
da
output:
<xarray.DataArray (time: 1674)>
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.671e+03, 1.672e+03, 1.673e+03])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2004-07-31
For yearly mean you can do:
da.groupby('time.year').mean()
output:
<xarray.DataArray (year: 5)>
array([ 182.5, 548. , 913. , 1278. , 1567. ])
Coordinates:
* year (year) int64 2000 2001 2002 2003 2004
For a mean per month of different year, you can create a multi-index:
year_month_idx = pd.MultiIndex.from_arrays([da['time.year'], da['time.month']])
da.coords['year_month'] = ('time', year_month_idx)
da.groupby('year_month').mean()
output:
<xarray.DataArray (year_month: 55)>
array([ 15. , 45. , 75. , 105.5, 136. , 166.5, 197. , 228. , 258.5,
289. , 319.5, 350. , 381. , 410.5, 440. , 470.5, 501. , 531.5,
562. , 593. , 623.5, 654. , 684.5, 715. , 746. , 775.5, 805. ,
835.5, 866. , 896.5, 927. , 958. , 988.5, 1019. , 1049.5, 1080. ,
1111. , 1140.5, 1170. , 1200.5, 1231. , 1261.5, 1292. , 1323. , 1353.5,
1384. , 1414.5, 1445. , 1476. , 1506. , 1536. , 1566.5, 1597. , 1627.5,
1658. ])
Coordinates:
* year_month (year_month) MultiIndex
* year_month_level_0 (year_month) int64 2000 2000 2000 ... 2002 2002 2002
* year_month_level_1 (year_month) int64 1 2 3 4 5 6 7 8 ... 11 12 1 2 3 4 5 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With