I'm trying to extract data from netCDF4 files. These contain "MaskedArrays" which are part of the Numpy library.
My Data contains: latitude, longitude, day and values (separated over different files). Additionally a mask which shows, which latitude/longitudes are not valid for various reasons (no measurements, or other reasons).
My data looks like this (for masked data):
masked_array(
    data =
     [[[-- -- -- ..., -- -- --]
        ..., 
       [-- -- -- ..., -- -- --]]],
    mask =
     [[[ True  True  True ...,  True  True  True]
        ...,
       [ True  True  True ...,  True  True  True]]],
    fill_value = 32767)
I'm searching for a numpy method (or similar), which can extract only these values which are not masked. Ideally just by cutting all non-valid entries out of the dataset.
I found .compressed, but it gives a one dimensional array back. From the 3rd dimension this is quite a loss of information as I don't know, where these values are.
Additionally I tried nonzero = the_array['one of the values'][0].nonzero(). 
This gives me a double array with lat/lon values, but after that I still have to access these - which is slow. Unfortunately after knowing how to access all this date I need to do that on 30*6 files each with ~1500×700×365 datapoints :D.
all_days = [(x, rhstmax['stuff'][x][24][1288]) for x in range(366)]
# represents just for lat:24,lon:1288 all days. First 20:
all_days[:20] =
    [(0, 15.799999),
     (1, 16.199999),
     (2, 17.4),
     (3, 13.2),
     (4, 10.8),
     (5, 11.3),
     (6, 15.299999),
     (7, 16.299999),
     (8, 14.099999),
     (9, 10.8),
     (10, 9.5),
     (11, 9.0999994),
     (12, 11.9),
     (13, 9.1999998),
     (14, 31.0),
     (15, 49.0),
     (16, 8.6999998),
     (17, 10.0),
     (18, 44.099998),
     (19, 30.699999)]
# ... takes forever :(
To get the non-masked data in Python you can use the .mask tool
Suppose you have the following dataset:
data = [[0.0 1.0 -- --]
       [2.0 3.0 -- --]]
You can obtain the non-masked data while getting all the indices which are False by the data.mask command. 
data = data[data.mask == False]
Note that this will give you 1D-array of all the input
data -> [0.0 1.0 2.0 3.0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With