Default value for pandas lookup when lookup value doesn't exist or is null

Question

I have a dataframe that looks like this:

       parent region
estid               
1         NaN      A
2         NaN      B
3         1.0      A
4         1.0      B
5         2.0      C
6         2.0      C
7         8.0      A

What I want is to create an extra column containing the region of the parent, defaulting to None if the parent is not found in the data, e.g.:

       parent region parent_region
estid                             
1         NaN      A          None
2         NaN      B          None
3         1.0      A             A
4         1.0      B             A
5         2.0      C             B
6         2.0      C             B
7         8.0      A          None

The following returns the correct result:

df["parent_region"] = df.apply(lambda x : df.loc[x["parent"]]["region"] if not math.isnan(x["parent"]) and x["parent"] in df.index else None, axis = 1)

But I'm very scared of inefficiencies, given that my dataframe has 168 million rows. Is there a better way to do it? I looked at lookup and get but I can't quite figure out how to work with IDs that can be NaN or not present in the dataframe.

For instance, I thought this could work: df.lookup(df["region"], df["parent"]), but it doesn't like very much null keys. df.get("region") does not return the parent's region, but the column itself, so it doesn't do what I want.

Nickil Maveli · Accepted Answer

You can use Series.map method which functions similar to a dictionary. The values from parent column and region column serve as the keys and values constituting it. The mapping takes place if they share a common index between them.

Additionally, na_action=ignore could be used to speedup this mapping process, as all NaNs present in these columns would be ignored altogether and simply propagated.

Lastly, the missing values must be replaced with None using Series.replace method.

df["parent_region"] = df.parent.map(df.region, na_action='ignore').replace({np.NaN:None})
Out[121]:
estid
1    None
2    None
3       A
4       A
5       B
6       B
7    None
Name: parent_region, dtype: object

Default value for pandas lookup when lookup value doesn't exist or is null

Tags:

python

lookup

pandas

Mikk

1 Answers

Nickil Maveli

Recent Activity

Donate For Us

Default value for pandas lookup when lookup value doesn't exist or is null

Tags:

python

lookup

pandas

Mikk

1 Answers

Nickil Maveli

Related questions

Recent Activity

Donate For Us