Dataframe.lookup and map combination resulting in column label error

Question

I have a large dataframe of around (1200, 10) of mostly string where I have to append a new column say 'Z' based an existing reference column say 'Y', whose values are 'A', 'B', 'C', or unknown (NaN or other), from this I need to select one of three corresponding columns in the df say 'D', 'E', 'F', or output NaN and append this value as column 'Z'. I currently have the following code:

df = pd.DataFrame({'T': {0: '.', 1: '.', 2: '.', 3: '.'}, 
                   'G': {0: '.', 1: '.', 2: '.', 3: '.'}, 
                   'D': {0: 4, 1: 1, 2: 5, 3: 3}, 
                   'E': {0: 6, 1: 2, 2: 7, 3: 2}, 
                   'F': {0: 8, 1: 3, 2: 9, 3: 1}, 
                   'K': {0: '.', 1: '.', 2: '.', 3:'.'}, 
                   'Y': {0: 'A', 1: 'B', 2: 'B', 3: np.nan}})

d = {'A': 'D', 'B': 'E', 'C': 'F'}
df['Z'] = df.lookup(df.index, df.Y.map(d))

The issue is that lookup breaks down where Y is an unknown value. and in the specific code, Y.unique() turns up something like (A, B, C, NaN, nan). So I was wondering if there would be a way to use a lookup-esque method that outputs Z to NaN where Y is NaN or unknown outside of the the given dict?


    T   G   D   E   F   K   Y   Z
0   .   .   4   6   8   .   A   4.0
1   .   .   1   2   3   .   B   2.0
2   .   .   5   7   9   .   B   7.0
3   .   .   3   2   1   .   NaN NaN

Scott Boston · Accepted Answer

You can use stack and reindex with zip for multiindexes:

df['Z'] = df.stack().reindex(zip(df.index, df.Y.map(d))).reset_index(level=1, drop=True)

Output:

   T  G  D  E  F  K    Y    Z
0  .  .  4  6  8  .    A    4
1  .  .  1  2  3  .    B    2
2  .  .  5  7  9  .    B    7
3  .  .  3  2  1  .  NaN  NaN

Details:

First, let's use stack, move the column header into the dataframe row index, creating a multiindex for the dataframe. Where, level 0 is the original row index, and level 1 are the column headers.

Now, we can use reindex to filter down to only those indexes we need. Multiindex are identified using tuples. (level0, level1) hence, we zip df.index and df.y.map(d) together creating the tuples used by reindex.

Lastly, we drop the level=1 of the multiindex creating a structure of the original index and assign a new column with those values.

Dataframe.lookup and map combination resulting in column label error

Tags:

python

pandas

dataframe

Roy Han

1 Answers

Scott Boston

Recent Activity

Donate For Us

Dataframe.lookup and map combination resulting in column label error

Tags:

python

pandas

dataframe

Roy Han

1 Answers

Scott Boston

Related questions

Recent Activity

Donate For Us