I have an xarray dataset containing one year of monthly CO2 emission data for Ireland which looks like this:
<xarray.Dataset>
Dimensions: (lat: 733, lon: 720, time: 12)
Coordinates:
* lat (lat) float32 49.9 49.9083 49.9167 49.925 49.9333 49.9417 49.95 ...
* lon (lon) float32 -11.0 -10.9917 -10.9833 -10.975 -10.9667 -10.9583 ...
* time (time) int16 181 182 183 184 185 186 187 188 189 190 191 192
Data variables:
soc (lat, lon, time) float64 nan nan nan nan nan nan nan nan nan ...
ch4 (lat, lon, time) float64 nan nan nan nan nan nan nan nan nan ...
co2 (lat, lon, time) float64 nan nan nan nan nan nan nan nan nan ...
n2o (lat, lon, time) float64 nan nan nan nan nan nan nan nan nan ...
no3 (lat, lon, time) float64 nan nan nan nan nan nan nan nan nan ...
If I plot one month of data it looks like this:
I want to sum the emissions for each lat/lon combination for each month and produce a map similar map, for annual sums rather than monthly values. I can sum the data like so:
sum = ds.co2.sum()
which gives:
<xarray.DataArray 'co2' ()>
array(453300000)
This sums the entirety of the data, and just gives one value. I want to produce a new dataset which contains the sums of the monthly data for each lat/lon combination, effectively giving 'annual' sums, that I can then produce a map of.
Any help would be greatly appreciated!
As you figured out in your answer, xarray aggregation operations can be done along any (or multiple) of the dimensions of a Dataset or DataArray by providing a dimension name:
sum2016 = ds.co2.sum(dim='time')
By default, aggregation operations skip NaN values in floats. You can retain the NaN values by using the skipna=False
argument to da.sum, e.g.:
sum2016 = ds.co2.sum(dim='time', skipna=False)
If you have multiple years in your dataset, you could group by the year (however you calculate this) and then sum over the months in a year. For example, if time in your dataset is simply a positional index for months, and the year can be found with ds.time%12
, you could find the annual total with:
ds.co2.groupby(ds.time%12).sum(dim='time', skipna=False)
to get the annual time series.
OK I think I did it, summing over the 'time' dimension seems to work!:
sum2016 = ds.co2.sum(dim=('time'))
produces:
<xarray.DataArray 'co2' (lat: 733, lon: 720)>
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])
Coordinates:
* lat (lat) float32 49.9 49.9083 49.9167 49.925 49.9333 49.9417 49.95 ...
* lon (lon) float32 -11.0 -10.9917 -10.9833 -10.975 -10.9667 -10.9583 ...
And this map of annual sums:
Sadly all the NA values have been turned into zeros, I will have to change them back to NA somehow.
EDIT:
I converted the 0.0 values to NA using this code:
sum2016mask = sum2016.where(sum2016 != 0.0)
giving this nicer map:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With