I have an xarray dataset of shipment line items called shipDS:
<xarray.Dataset>
Dimensions: (lineItem: 10)
Dimensions without coordinates: lineItem
Data variables:
shipID (lineItem) int64 1 1 2 3 4 4 5 5 5 5
prodID (lineItem) int64 90 92 92 90 90 91 92 93 94 95
category (lineItem) <U1 'A' 'B' 'B' 'A' 'A' 'A' 'B' 'C' 'D' 'D'
Each line item has a category. The question is how many unique categories were included in every shipment (i.e. shipID)?
Here is the code to create the xarray dataset:
import xarray as xr
## construct dataset of shipments
## each shipment might have multiple products
shipIDs = [1,1,2,3,4,4,5,5,5,5]
productIDs = [90,92,92,90,90,91,92,93,94,95]
category = ["A","B","B","A","A","A","B","C","D","D"]
shipDS = xr.Dataset(data_vars =
{
"shipID": ("lineItem", shipIDs),
"prodID": ("lineItem", productIDs),
"category": ("lineItem", category)
}
)
## How many product categories were in every shipment (i.e. in each shipID)?
To answer this in Pandas, I would do the following:
## pandas answer
(
shipDS
.to_pandas()
.groupby(["shipID","category"])
.count()
.reset_index()
.groupby("shipID")
.agg(numUniqCategories=('category', 'count'))
)
and get the following result:
numUniqCategories
shipID
1 2
2 1
3 1
4 1
5 3
I am trying to become an xarray expert, but cannot seem to figure out the right way to do this solely in xarray.
At least for now, there's no built-in xarray method to count unique values. Maybe this discussion might help: xarray-forum. The most viable way to do this is your pandas solution, although it's not scalable for huge xr.Datasets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With