Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default value for pandas lookup when lookup value doesn't exist or is null

I have a dataframe that looks like this:

       parent region
estid               
1         NaN      A
2         NaN      B
3         1.0      A
4         1.0      B
5         2.0      C
6         2.0      C
7         8.0      A

What I want is to create an extra column containing the region of the parent, defaulting to None if the parent is not found in the data, e.g.:

       parent region parent_region
estid                             
1         NaN      A          None
2         NaN      B          None
3         1.0      A             A
4         1.0      B             A
5         2.0      C             B
6         2.0      C             B
7         8.0      A          None

The following returns the correct result:

df["parent_region"] = df.apply(lambda x : df.loc[x["parent"]]["region"] if not math.isnan(x["parent"]) and x["parent"] in df.index else None, axis = 1)

But I'm very scared of inefficiencies, given that my dataframe has 168 million rows. Is there a better way to do it? I looked at lookup and get but I can't quite figure out how to work with IDs that can be NaN or not present in the dataframe.

For instance, I thought this could work: df.lookup(df["region"], df["parent"]), but it doesn't like very much null keys. df.get("region") does not return the parent's region, but the column itself, so it doesn't do what I want.

like image 524
Mikk Avatar asked Oct 20 '25 12:10

Mikk


1 Answers

You can use Series.map method which functions similar to a dictionary. The values from parent column and region column serve as the keys and values constituting it. The mapping takes place if they share a common index between them.

Additionally, na_action=ignore could be used to speedup this mapping process, as all NaNs present in these columns would be ignored altogether and simply propagated.

Lastly, the missing values must be replaced with None using Series.replace method.

df["parent_region"] = df.parent.map(df.region, na_action='ignore').replace({np.NaN:None})
Out[121]:
estid
1    None
2    None
3       A
4       A
5       B
6       B
7    None
Name: parent_region, dtype: object
like image 182
Nickil Maveli Avatar answered Oct 23 '25 02:10

Nickil Maveli



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!